METHODS FOR DIAGNOSING RCC AND OTHER SOLID TUMORS 



[0001] This application incorporates by reference the entire disclosure of U.S. 

Provisional Application Serial No. 60/427,982, filed November 21, 2002 and entitled 
"Methods for Diagnosing RCC and/or Solid Tumors." This application also incorporates by 
reference the entire disclosure of U.S. Provisional Application Serial No. 60/459,782, filed 
April 3, 2003 and entitled "Methods for Diagnosing RCC and/or Solid Tumors." In 
addition, this application incorporates by reference all materials recorded in compact discs 
"Copy 1" and "Copy 2." Each of the compact discs includes the sequence listing file 
entitled "AM101080L Sequence Listing.ST25.txf (2,206 KB, created on November 20, 
2003). 

TECHNICAL FIELD 

[0002] This invention relates to methods, systems and equipment for diagnosing 

RCC and other solid tumors. 

BACKGROUND 

[0003] Renal cell carcinoma (RCC) comprises the majority of all cases of kidney 

cancer and is one of the most common cancers in industrialized countries. When detected 
early, radical nephrectomy can result in an excellent survival rate for RCC patients. 
However, the survival rate for patients with metastasized RCC tumors is reduced 
dramatically. Therefore, there is a need to provide methodologies, systems and equipment 
for the early diagnosis of RCC. 

[0004] RCC patients frequently have non-specific symptoms or are completely 

asymptomatic. In fact, a significant percentage of renal lesions are incidentally detected by 
non-invasive imaging techniques. General screening methods for RCC are available, but 
these methods lack sufficient sensitivity and specificity for broad application. Recent U.S. 
Patent No. 6,087,098 generally describes an RT-PCR based method for detecting the 
expression of the MN gene in peripheral blood samples. The MN protein is believed to be a 
marker of malignant renal cells. Therefore, detection of the MN gene expression in the 
peripheral blood suggests the presence of RCC. 

[0005] The present invention represents a significant advance in the diagnosis of 

RCC and/or other solid tumors such as prostate cancer and head/neck cancer. The 
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diagnostic test of the present invention relies on the detection of gene expression patterns in 
peripheral blood cells rather than in tumor cells themselves. As such, the present invention 
allows widespread screen for early stages of solid tumor progression. 

SUMMARY OF THE INVENTION 

[0006] The present invention identifies numerous disease genes that are 

differentially expressed in the peripheral blood of patients having RCC or other solid tumors 
as compared to disease-free humans. These disease genes can be used as surrogate markers 
for detecting the presence or absence of RCC or other solid tumors. 

[0007] In accordance with one aspect of the present invention, a method is provided 

that is useful for diagnosis of RCC and other solid tumors. The method comprises the steps 
of providing at least one peripheral blood sample of a human, and comparing an expression 
profile of one or more genes in the at least one peripheral blood sample to at least one 
reference expression profile of the one or more genes. Each of the one or more genes is 
differentially expressed in PBMCs of patients having a solid tumor as compared to PBMCs 
of disease- free humans, provided that if the one or more genes consist of only one gene, the 
gene is not selected from the group consisting of IL1B, IL6, MMP-9 and FCGR3B, and 
further provided that if the one or more gene consist of two genes, the two genes are not 
IL1B and IL6. 

[0008] The peripheral blood sample can be a whole blood sample or a sample 

comprising enriched peripheral blood mononuclear cells (PBMCs). Other peripheral blood 
samples can also be used. The solid tumor can be, for example, RCC, prostate cancer, or 
head/neck cancer. The human being investigated can have the solid tumor, or is free from 
the solid tumor or other diseases. 

[0009] The reference expression profile(s) can include an expression profile of the 

one or more genes in peripheral blood samples of disease-free humans. The reference 
expression profile(s) can also include an expression profile of the one or more genes in 
peripheral blood samples of patients having the solid tumor. In addition, the reference 
expression profile(s) can further include an expression profile of the one or more genes in 
peripheral blood samples of patients having another solid tumor. The expression profile of 
the human being investigated can be compared to different reference expression profiles 
using a weighted voting algorithm. 
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[0010] The expression profile of the human being investigated and the reference 

expression profile(s) can be determined using quantitative RT-PCR, Northern Blot, in situ 
hybridization, Southern Blot, slot-blotting, nuclease protection assay, or nucleic acid arrays. 
The expression profiles can also be determined using immunoassays such as ELISA 
(enzyme-linked immunosorbent assay), RIA (radioimmunoassay), FACS (fluorescence- 
activated cell sorter), or Western Blot. In addition, methods based on 2-dimensional SDS- 
polyacrylamide gel electrophoresis can be used. 

[0011] In a preferred embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes selected from Gene-Table-4. In another 
preferred embodiment, the one or more genes include at least 1, 2, 3, 4, 6, 8, 10, 12, 14, 16, 
18, 20, or more genes selected from Table-6. In yet another preferred embodiment, the one 
or more genes include a classifier identifiable using a two-class or multi-class correlation 
metric algorithm. 

[0012] In still another embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 genes selected from the group consisting of: EEF1A2, 
TLR2, BRF2, LGALS3, SNRPG, DKFZP586E1621, NUMA1, SOD2, AKR1B1, DUSP6, 
SMARCE1, KIAA0669, MSF, IL1RN, PTMA, KIAA0410, PSMD3, T54, C1QBP, and 
OSR1. 

[0013] In a further embodiment, the one or more genes include at least 1, 2, 3, 4, 5, 

6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 genes selected from the group consisting of: CD44, 
KIAA0410, MARCO, MAP3K8, NSP-CL, PIP5K1C, NRG1, RAB31, LGALS3, MEF2D, 
ITGA7, LHFPL2, ETS2, KHSRP, ENIGMA, UNK_AF038187, RAB13, TLR2, T54 and 
DUSP6. 

[0014] In yet another embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 genes selected from the group consisting of: CD44, 
CRADD, CCRL2, KIAA0837, KIAA0707, KIAA1113, EREG, UNK_AL0501 19, PPARD, 
CTSL, ATP2B1, UNK_AF0521 15, MITF, STAT3, KIAA0410, TPD52L2, 
UNK_AI732885, MARCO, LOC641 16, and PDNP2. 

[0015] In still yet another embodiment, the one or more genes include at least 1, 2, 

3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes selected from the group consisting 
of: FABP5, SCYA20, ADM, COPEB, FCGR3B, UNK_M62896, FN1, HMOX1, ITGA7, 
DGCR5, CBP2, SLC1A4, MMP9, SLC16A3, LILRB3, FCGR1A, LHFPL2, PLEC1, 
S100A11, SPOP, CCR1,TLR2 and KIAA0750. 
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[0016] In another embodiment, the one or more genes include at least 1, 2, 3, 4, 5, 6, 

7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes selected from the group consisting of: ADM, 
COPEB, AQP9, PTGS2, STIP1, SOD2, PDXK, IL1RN, ANXA5, IFIT4, IL1B, GROl, 
PLAUR, NP, MMP9, SLC16A3, LILRB3, FCGR1A, LHFPL2, PLEC1, S100A11, SPOP, 
CCR1, TLR2, KIAA0750, CDC34, POLR2J, ETS2, MAD, GPR3, PIP5K1C, PRF1, 
PSMA7, INPP4A, TCFL1, DGAT, S100P, DOC-1R, C8FW, PDI2, GEF-2, TNNT1, BSG, 
IL17R, HK3, RALBP1, RNASE2, TPM1, BLVRB, APS, PPARD, NFE2, IL1RAP, 
S100A12, CD9, ENIGMA, HAGH, NCF1, FLOT1, ITGA2B, KIAA0750, FKBP8, DUSP6 
and CBFA2T3. 

[0017] In yet another embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or more genes selected from the group consisting of: 
NUMA1, CXCR4, IL10RA, M9, FAU, BRF2, RPS6, EEF1A2, BAGS, AKR1B1, 
UNK_AL022721, C1QBP, DKZP586E0820, NONO, PSMD3, UNK_N74607, 
UNK_AI743507, MAPKAPK5, and UNKJJ79297. 

[0018] In another preferred embodiment, the one or more genes include at least 1, 2, 

3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes, each of which has an RNA transcript 
capable of hybridizing under stringent conditions to a different respective classification 
probe sequence (CPS) selected from CPS-Table-2. In one specific example, if the one or 
more genes consist of only one gene, the RNA transcript(s) of the gene can not hybridize 
under stringent conditions to a CPS selected from the group consisting of CPSs 58, 21 1, 221 
and 241. In another specific example, if the one or more genes consist of two genes, the 
RNA transcript(s) of the two genes can not hybridize under stringent conditions to CPSs 
211 and 241. 

[0019] In one embodiment, the one or more genes include at least 1, 2, 3, 4, 5, 6, 7, 

8, 9, 10, 12, 14, 16, 18, or 20 genes, each of which has an RNA transcript capable of 
hybridizing under stringent conditions to a different respective CPS selected from the group 
consisting of: CPS 1, CPS 3, CPS 4, CPS 6, CPS 18, CPS 38, CPS 53, CPS 255, CPS 256, 
CPS 257, CPS 258, CPS 259, CPS 260, CPS 261, CPS 262, CPS 263, CPS 264, CPS 265, 
CPS 266, and CPS 267. 

[0020] In another embodiment, the one or more genes include at least 1, 2, 3, 4, 5, 6, 

7, 8, 9, 10, 12, 14, 16, 18, or 20 genes, each of which has an RNA transcript capable of 
hybridizing under stringent conditions to a different respective CPS selected from the group 



W288761.2 



4 



consisting of: CPSs 1, 3, 4, 5, 6, 7, 9, 10, 11, 16, 28, 31, 268, 264, 279, 280, 281, 282, 283 
and 284. 

[0021] In yet another embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or 20 genes, each of which has an RNA transcript capable of 
hybridizing under stringent conditions to a different respective CPS selected from the group 
consisting of: CPSs 17, 31, 37, 50, 59, 64, 69, 71, 264, 268, 269, 270, 271, 272, 273, 274, 
275, 276, 277 and 278. 

[0022] In still yet another embodiment, the one or more genes include at least 1, 2, 

3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes, each of which has an RNA transcript 
capable of hybridizing under stringent conditions to a different respective CPS selected 
from the group consisting of: CPSs 1, 2, 8, 16, 19, 26, 28, 57, 58, 61, 91, 92, 99, 138, 143, 
148, 152, 191, 192, 207, 221, 229, 236 and 245. 

[0023] In yet another embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes, each of which has an RNA transcript 
capable of hybridizing under stringent conditions to a different respective CPS selected 
from the group consisting of: CPSs 1, 4, 9, 10, 11, 12, 14, 17, 18, 19, 21, 25, 28, 34, 35, 40, 
47, 52, 53, 58, 61, 62, 84, 87, 91, 92, 94, 99, 104, 105, 109, 111, 115, 125, 128, 130, 133, 
135, 138, 143, 146, 147, 148, 151, 154, 157, 158, 165, 173, 174, 178, 191, 192, 194, 195, 
201, 211, 220, 222, 227, 244, 247 and 250. 

[0024] In one further embodiment, the one or more genes include at least 1, 2, 3, 4, 

5, 6, 7, 8, 9, 10, 12, 14, 16, 18, or more genes, each of which has an RNA transcript capable 
of hybridizing under stringent conditions to a different respective CPS selected from the 
group consisting of: CPSs 107, 131, 255, 256, 258, 259, 265, 266, 285, 286, 287, 288, 289, 
290, 291, 292, 293, 294, and 295. 

[0025] In yet another preferred embodiment, the one or more genes include at least 

1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes, each of which has an RNA 
transcript capable of hybridizing under stringent or nucleic acid array hybridization 
conditions to a different respective qualifier selected from ATTACHMENT A. In one 
specific example, if the one or more genes consist of only one gene, the RNA transcript(s) 
of the gene can not hybridize under stringent or nucleic acid array hybridization conditions 
to a qualifier selected from the group consisting of 37148_at, 39402_at, 31859 at and 
38299_at. In another specific example, if the one or more genes consist of two genes, the 
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RNA transcript(s) of the two genes can not hybridize under stringent or nucleic acid array 
hybridization conditions to qualifiers 39402_at and 38299_at. 

[0026] In accordance with another aspect of the present invention, a method is 

provided that is useful for diagnosing or confirming a non-blood disease. The non-blood 
disease can be a solid tumor such as RCC, prostate cancer, or head/neck cancer. The non- 
blood disease can also be a non-tumor disease, including diseases capable of causing renal 
failure. The method includes the steps of providing at least one peripheral blood sample of 
a human having the non-blood disease, and comparing an expression profile of one or more 
genes in the at least one peripheral blood sample to at least one reference expression profile 
of the one or more genes, where each of the one or more genes is differentially expressed in 
PBMCs of patients having the non-blood disease as compared to PBMCs of disease-free 
humans. 

[0027] In one embodiment, the one or more genes comprise at least 1, 2, 3, 4, 5, 6, 

7, 8, 9, 10, 12, 14, 16, 18, 20, or more genes selected from Gene-Table-4, and the peripheral 
blood sample is a whole blood sample or a sample comprising enriched PBMCs. In another 
embodiment, the reference expression prpfile(s) include an expression profile of the one or 
more genes in peripheral blood samples of humans who do not have the non-blood disease 
or are disease-free. In yet another embodiment, the average expression level of each of the 
one or more genes in PBMCs of patients having the non-blood disease is substantially 
higher or substantially lower than that in PBMCs of humans who do not have the non-blood 
disease or are disease- free. 

[0028] In accordance with yet another aspect of the present invention, a method is 

provided that is useful for identifying a gene that is differentially expressed in peripheral 
blood samples of non-blood disease patients as compared to peripheral blood samples of 
reference humans. The method comprises the steps of providing an expression profile of 
one or more genes in peripheral blood samples of non-blood disease patients, providing a 
reference expression profile of the one or more genes in peripheral blood samples of 
reference humans, and comparing the expression profile to the reference expression profile 
to identify a gene that is differentially expressed in non-blood disease patients relative to 
reference humans. The expression profile and the reference expression profile can be 
determined, for example, by hybridizing cRNA or cDNA prepared from the peripheral 
blood samples to one or more nucleic acid arrays. The reference humans can be disease- 
free humans. The reference humans can also have the non-blood disease but at a different 
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disease stage or with a different clinical response than the patients being investigated. In 
one embodiment, the non-blood disease is a solid tumor. 

[0029] In accordance with still yet another aspect of the present invention, a kit is 

provided that is useful for diagnosis of RCC or other solid tumors. In one embodiment, the 
kit includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, or more polynucleotides, each 
polynucleotide capable of hybridizing under stringent conditions to an RNA transcript, or 
the complement thereof, of a different respective gene which is differentially expressed in 
PBMCs of patients having a solid tumor as compared to PBMCs of disease- free humans. In 
another embodiment, the kit includes at least 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 15, 20, 25, 30, or 
more antibodies, each antibody capable of binding to a polypeptide encoded by a different 
respective gene which is differentially expressed in PBMCs of patients having a solid tumor 
relative to disease-free humans. 

[0030] In accordance with a further aspect of the present invention, a system is 

provided that is useful for diagnosis of a non-blood disease. The non-blood disease can be a 
solid tumor, such as RCC, prostate cancer, or head/neck cancer. The system includes a 
memory which stores one or more reference expression profiles of at least one gene in 
peripheral blood samples of references humans. Each gene is differentially expressed in 
PBMCs of patients having the non-blood disease as compared to PBMCs of disease- free 
humans. The peripheral blood samples can be whole blood samples or samples comprising 
enriched PBMCs. The one or more reference expression profiles can include a peripheral 
blood expression profile of disease- free humans. The one or more reference expression 
profiles can also include a peripheral blood expression profile of patient having the non- 
blood disease. In addition, the one or more reference expression profiles can include a 
peripheral blood expression profile of patients having another non-blood disease. The 
system further includes a program capable of comparing an expression profile of interest to 
the one or more reference expression profiles, and a processor capable of executing the 
program. In one embodiment, the program employs a weighted voting algorithm. 



BRIEF DESCRIPTION OF THE DRAWINGS 

[0031] The patent or application file contains at least one drawing executed in color. 

Copies of this patent or patent application publication with color drawing(s) will be 
provided by the Office upon request and payment of the necessary fee. 
[0032] The drawings are provided for illustration, not limitation. 
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[0033] FIG. 1 depicts the statistical verification of the RCC disease genes identified 

in this invention. 

[0034] FIG. 2 shows a dendrogram of sample relatedness using expressed gene 

expression values. 

[0035] FIG. 3 is a diagram summarizing the training set cross validation results for 

predictor gene set of increasing size. 

[0036] FIG. 4 illustrates the relative expression levels of a set of eight predictive 

genes in a training set. 

[0037] FIG. 5A demonstrates the cross validation results for each sample in the 

training set using the 8-gene predictor set as illustrated in FIG. 4. 

[0038] FIG. 5B shows the prediction results for the remaining test set of RCC and 

normal PBMC samples using the 8 gene predictor set as illustrated in FIG. 4. 
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DETAILED DESCRIPTION 



I. DEFINITION 

[0039] As used herein, "CPS-Table-2" refers to the entire classification probe 

sequences (CPSs) listed in Table 2. 

[0040] "Gene-Table-4" refers to all of the genes listed in Table 4. 

[0041] A "gene" refers to a DNA sequence in the human genome, from which at 

least one RNA molecule can be transcribed. As used in the present invention, a gene can be 
a hypothetical or putative gene the expression of which is supported by EST or mRNA data. 
[0042] A "disease-free human" refers to a human who does not have any detectable 

cancer or other diseases which require medical attention or treatment. 

[0043] "Stringent conditions" are at least as stringent as, for example, conditions G- 

L shown in Table 1 . "Highly stringent conditions" are at least as stringent as conditions A- 
F shown in Table 1 . As used in Table 1, hybridization is carried out under the hybridization 
conditions (Hybridization Temperature and Buffer) for about four hours, followed by two 
20-minute washes under the corresponding wash conditions (Wash Temp, and Buffer). 



Table 1 . Stringency Conditions 



Stringency 
Condition 


Poly-nucleotide 
Hybrid 


Hybrid 
Length (bp) 1 


Hybridization 
Temperature and Buffer" 


Wash Temp, 
and Buffer" 


A 


DNA:DNA 


>50 


65°C; lxSSC -or- 

42°C; lxSSC, 50% formamide 


65°C; 0.3xSSC 


B 


DNA:DNA 


<50 


T B *; lxSSC 


T B *; lxSSC 


C 


DNA: RNA 


>50 


67°C; lxSSC-or- 

45°C; lxSSC, 50% formamide 


67°C; 0.3xSSC 


D 


DNARNA 


<50 


T D *; lxSSC 


T D *; lxSSC 


E 


RNARNA 


>50 


70°C; lxSSC-or- 

50°C; lxSSC, 50% formamide 


70°C; 0.3xSSC 


F 


RNA:RNA 


<50 


TV*; lxSSC 


T f *; lxSSC 


G 


DNADNA 


>50 


65°C; 4xSSC -or- 

42°C; 4xSSC, 50% formamide 


65°C; lxSSC 


H 


DNA:DNA 


<50 


T H *; 4xSSC 


T H *; 4xSSC 


I 


DNARNA 


>50 


67°C; 4xSSC -or- 

45°C; 4xSSC, 50% formamide 


67°C; lxSSC 


J 


DNARNA 


<50 


Tj*; 4xSSC 


Tj*; 4xSSC 


K 


RNA: RNA 


>50 


70°C; 4xSSC -or- 

50°C; 4xSSC, 50% formamide 


67°C; lxSSC 


L 


RNARNA 


<50 


T L *; 2xSSC 


T L *; 2xSSC 
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1: The hybrid length is that anticipated for the hybridized region(s) of the 
hybridizing polynucleotides. When hybridizing a polynucleotide to a target polynucleotide 
of unknown sequence, the hybrid length is assumed to be that of the hybridizing 
polynucleotide. When polynucleotides of known sequence are hybridized, the hybrid length 
can be determined by aligning the sequences of the polynucleotides and identifying the 
region or regions of optimal sequence complementarity. 

H : SSPE (lxSSPE is 0.15M NaCl, lOmM NaH 2 P0 4 , and 1.25mM EDTA, pH 7.4) 
can be substituted for SSC (lxSSC is 0.15M NaCl and 15mM sodium citrate) in the 
hybridization and wash buffers. 

Tb* - T R *: The hybridization temperature for hybrids anticipated to be less than 50 
base pairs in length should be 5-10°C less than the melting temperature (T m ) of the hybrid, 
where T m is determined according to the following equations. For hybrids less than 18 base 
pairs in length, T m (°C) = 2(# of A + T bases) + 4(# of G + C bases). For hybrids between 
18 and 49 base pairs in length, T m (°C) = 81.5 + 16.6(log 10 Na + ) + 0.41 (%G + C) - (600/N), 
where N is the number of bases in the hybrid, and Na + is the molar concentration of sodium 
ions in the hybridization buffer (Na + for 1 xSSC = 0. 1 65M). 

[0044] Various aspects of the invention are described in further detail in the 

following sections and subsections. The use of sections and subsections is not meant to 
limit the invention; each section and subsection may apply to any aspect of the inventioa 



II. THE INVENTION 

[0045] The present invention provides methods for diagnosing RCC and other solid 

tumors by detecting gene expression patterns in peripheral blood. The present invention 
identifies a plurality of RCC disease genes which are differentially expressed in the 
peripheral blood of RCC patients compared to disease-free humans. At least a subset of 
these RCC disease genes is also differentially expressed in other solid tumors such as 
prostate cancer and head/neck cancer. Therefore, these genes can be used as surrogate 
markers for detecting the presence or absence of RCC and/or other solid tumors. In one 
embodiment, the expression patterns of these genes in peripheral blood can be determined 
by assessing the levels of RNA transcripts of these genes in peripheral blood samples. The 
peripheral blood samples may be the whole blood or blood samples containing enriched 
PBMCs. Suitable methods for detecting RNA levds include, but are not limited to, 
quantitative RT-PCT, Northern Blot, in situ hybridization, Southern Blot, slot-blotting, 
nuclease protection assay, and nucleic acid arrays. In another embodiment, the gene 
expression patterns can be determined by detecting the levels of polypeptides encoded by 
the solid tumor disease genes. Suitable methods include, but are not limited to, 
immunoassays such as ELISA (enzyme-linked immunosorbent assay), RIA 
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(radioimmunoassay), FACS (fluorescence-activated cell sorter), or Western Blot. Methods 
based on 2-dimensional SDS-polyacrylamide gel electrophoresis can also be used. 

A. General Methods for Identifying RCC and Solid Tumor Disease Genes in 
Peripheral Blood 

[0046] The availability of the human genome sequence, together with new 

developments in technology, such as DNA microarrays, proteomics and computational 
biology, allows systemic gene expression studies for various diseases. This invention 
employs the systematic gene expression analysis technique to identify genes and/or markers 
that are differentially expressed in the peripheral blood of patients with solid tumors such as 
RCC, prostate cancer, and head/neck cancer. These genes are herein referred to as "solid 
tumor disease genes." In particular, the genes that are differentially expressed in the 
peripheral blood of RCC patients compared to disease-free humans are referred to as "RCC 
disease genes." 

[0047] Solid tumor disease genes are either over-expressed or under-expressed 

(including no expression) in the peripheral blood of solid tumor patients compared to 
disease-free humans. Therefore, solid tumor disease genes can be identified by comparing 
the gene expression patterns of solid tumor patients to the corresponding gene expression 
patterns of disease-free humans. Methods for detecting and comparing gene expression 
patterns are well known in the art. 

[0048] In one embodiment, the gene expression patterns are detected by measuring 

the levels of RNA transcripts in the peripheral blood. For instance, total RNAs or polyA + 
RNAs can be isolated from a peripheral blood sample. As used herein, a biological 
material, such as a polynucleotide, a polypeptide, a cell or a blood sample, is "isolated" if 
the biological material is removed from its native environment. For instance, a 
polynucleotide or a polypeptide can be isolated through a purification or extraction process. 
A blood sample can be isolated when it is removed from the human body. 
[0049] The isolated RNAs are then amplified to produce cDNAs or cRNAs. The 

level of expression of a gene in the peripheral blood sample can be determined by 
measuring the amount of the corresponding cDNAs or cRNAs thus amplified. 
[0050] One exemplary amplification protocol uses reverse transcriptase. For 

instance, isolated mRNAs can be first reverse transcribed into cDNAs using a reverse 
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transcriptase, and a primer consisting of oligo d(T) and a sequence encoding the phage T7 
promoter. The cDNAs thus produced are single-stranded. The second strands of the 
cDNAs are synthesized using a DNA polymerase, combined with an RNase to break up the 
DNA/RNA hybrid. After synthesis of the double-stranded cDNAs, T7 RNA polymerase is 
added, and cRNAs are then transcribed from the second strands of the doubled-stranded 
cDNAs. 

[0051] In another embodiment, the gene expression patterns can be analyzed by 

measuring the levels of polypeptides in the peripheral blood. The amounts of polypeptides 
in a peripheral sample can be detected using various methods well known in the art. 
Suitable methods include, but are not limited to, immunoassays such as ELISA, RIA, F ACS 
and Western Blot. High-throughput protein sequencing and identification methods can also 
be used, such as the methods based on two-dimensional gel electrophoresis and mass 
spectrometry. 

[0052] In a preferred embodiment, the peripheral blood samples used for isolating 

RNA or polypeptides contain enriched or purified peripheral blood mononuclear cells 
(PBMCs). Methods for preparing blood samples with concentrated PBMCs are well known 
in the art. For instance, whole blood isolated from human subjects can be centrifuged 
through Ficoll gradients or CPTs (cell purification tubes), and the fraction containing 
enriched PBMCs is collected. "Enriched" means that the percentage of PBMCs in the 
sample is higher than the percentage of PBMCs in the initial whole blood. For instance, the 
percentage of PBMCs in the enriched sample can be at least 2, 3, 4, 5 or more times higher 
than that in the initial whole blood. In one embodiment, whole blood can be directly used to 
screen for solid tumor disease genes. 

[0053] In another preferred embodiment, polynucleotide arrays, such as cDNA or 

oligonucleotide arrays, can be used to detect and/or compare the gene expression profiles in 
the peripheral blood of solid tumor patients versus diseases free humans. Polynucleotide 
arrays allow quantitative detecting and monitoring of the levels of RNA transcripts of a 
large number of genes at one time. Polynucleotide arrays suitable for this global gene 
expression analysis include, but are not limited to, commerdally available arrays such as 
Genechip® arrays from Affymetrix (Santa Clara, CA) or cDNA microarrays from Agilent 
Technologies (Palo Alto, CA). 

[0054] Polynucleotides to be hybridized to microarrays can be labeled with one or 

more labeling moieties to allow for detection of hybridized polynucleotide complexes. The 
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labeling moieties can include compositions that can be detected by spectroscopic, 
photochemical, biochemical, bioelectronic, immunochemical, electrical, optical or chemical 
means. The labeling moieties include radioisotopes, chemiluminescent compounds, labeled 
binding proteins, heavy metal atoms, spectroscopic markers such as fluorescent markers and 
dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, electron transfer 
donors and acceptors, and the like. The polynucleotides to be hybridized to the microarrays 
can be either DNA or RNA. 

[0055] Hybridization reactions can be performed in absolute or differential 

hybridization formats. In the absolute hybridization format, polynucleotides derived from 
one sample, such as a peripheral blood sample from a RCC patient or a disease- free human, 
are hybridized to the probes in a microarray. Signals detected after the formation of 
hybridization complexes correlate to the polynucleotide levels in the sample. In the 
differential hybridization format, polynucleotides derived from two biological samples, such 
as one from solid tumor patients and the other from disease-free humans, are labeled with 
different labeling moieties. A mixture of these differently labeled polynucleotides is added 
to a microarray. The microarray is then examined under conditions in which the emissions 
from the two different labels are individually detectable. In one embodiment, the 
fluorophores Cy3 and Cy5 (Amersham Pharmacia Biotech, Piscataway NJ.) are used as the 
labeling moieties for the differential hybridization format. 

[0056] Signals gathered from microarrays can be analyzed using commercially 

available software, such as those provide by Affymetrix or Agilent Technologies. Controls, 
such as for scan sensitivity, probe labeling and cDNA quantitation, preferably are included 
in the hybridization experiments. The microarray expression signals can be scaled or 
normalized before being subject to further analysis. For instance, the expression signals for 
each gene can be normalized to take into account variations in hybridization intensities 
when more than one array is used under similar test conditions. Signals for individual 
polynucleotide complex hybridization can also be normalized using the intensities derived 
from internal normalization controls contained on each array. In addition, genes with 
relatively consistent expression levels across the samples can be used to normalize the 
expression levels of other genes. In one embodiment, the expression levels of the genes are 
normalized across the samples such that the mean is zero and the standard deviation is one. 
In another embodiment, the expression data detected by the microarray are subject to a 
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variation filter which excludes genes showing minimal or insignificant variation across all 
samples. 

[0057] The gene expression profiles in the peripheral blood samples of solid tumor 

patients can be compared to the corresponding gene expression profiles in the peripheral 
blood samples of disease-free humans. Genes that are differentially expressed in solid 
tumor patients relative to disease-free humans are identified. Preferably, the level of 
expression of a solid tumor disease gene is substantially higher or lower in solid tumor 
patients than in disease-free humans. "Substantially higher" means that the average 
expression level of a gene in the peripheral blood samples of solid tumor patients is at least 
1.5 times over the average expression level of the gene in the peripheral blood samples of 
disease-free humans. For instance, the average expression level in solid tumor patients can 
be at least 2, 3, 4, 5, 10, 20, or more times over the average expression level in disease-free 
humans. "Substantially lower" means that the average expression level of a gene in the 
peripheral blood samples of solid tumor patients is no greater than 0.67 times over the 
average expression level of the gene in the peripheral blood samples of disease-free 
humans. For instance, the average expression level in solid tumor patients can be no greater 
than 0.5, 0.33, 0.25, 0.1, 0.05 or less times over the average expression level in disease-free 
humans. 

[0058] In one embodiment, solid tumor disease genes can be identified using 

clustering algorithms based on the microarray gene expression data. For instance, 
unsupervised cluster analysis can be used to analyze and categorize genes with different 
expression patterns, thereby identifying solid tumor disease genes. Algorithms for 
unsupervised cluster analysis include, but are not limited to, self-organized maps (SOMs), 
principle component analysis, average linkage clustering, and hierarchical clustering. 
[0059] Supervised cluster analysis can also be employed to organize and identify 

solid tumor disease genes. Under supervised cluster analysis, the disease status of the 
source from which a gene expression pattern is derived is already known. Algorithms for 
supervised cluster analysis include, but are not limited to, nearest neighbors test, support 
vector machines, and SPLASH. Either two-class or multi-class correlation metrics can be 
used. 

[0060] In a preferred embodiment, a permutation test-based neighborhood analysis 

is used to analyze the microarray gene expression data in order to identify solid tumor 
disease genes. The algorithm for the neighborhood analysis is described in T.R. Golub, et 
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aL, Science, 286: 531-537 (1999), and D.K. Slonim et ah, Procs. of the Fourth Annual 
International Conference on Computational Molecular Biology, Tokyo, Japan, April 8-11, 
p263-272 (2000), both of which are incorporated herein by reference. 

[0061] Under one form of the neighborhood analysis, the expression profile of each 

gene is represented by an expression vector g =■ (ei, e 2 , e3, . . ., en), where ei corresponds to 
the expression level of gene "g" in the ith sample. A class distinction is represented by an 
idealized expression pattern c = (ci, C2, C3, . . ., Cn), where C\ = 1 or -1, depending on 
whether the ith sample is isolated from class 0 or class 1 . Class 0 may consist of patients 
with a particular solid tumor such as RCC, and class 1 may represent disease-free humans. 
Class 0 may also consist of patients with different solid tumors. 

[0062] The correlation of gene "g" to the class distinction can be calculated using a 

signal-to-noise score: 

P(gc)= *<Kg)-*i(g) 

V *' sdQ{g)+sd\(g) 

where x0(g) and xl(g) represent the means of the log of the expression level of gene "g" in 
class 0 and class 1, respectively, and sd0(g) and sdl(g) represent the standard deviation of 
the log of the expression of gene "g" in class 0 and class 1, respectively. A higher absolute 
value of a signal-to-noise score indicates that the corresponding gene is more highly 
expressed in one class than in the other. An unusually high density of genes within the 
neighborhoods of the class distinction, as compared to random patterns, suggests that many 
genes have expression patterns that are significantly correlated with the class distinction. 
[0063] A plurality of solid tumor disease genes can be selected using the 

neighborhood analysis. In one embodiment, each solid tumor disease gene thus selected has 
a substantially higher or lower expression level in PBMCs of solid tumor patients than in 
PBMCs of disease-free humans. In another embodiment, the selected solid tumor disease 
genes have top absolute values of P(g,c). In yet another embodiment, the selected solid 
tumor disease genes include both genes that are highly expressed in class 0 (such as RCC 
patients), and genes that are highly expressed in class 1 (such as disease-free humans). The 
solid tumor disease genes selected in the present invention can be involved in different 
biological pathways or mechanisms. 

[0064] In one embodiment, the number of the selected solid tumor disease genes is 

limited to those shown to be significantly correlated by the permutation test, such as at the 
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1% or 2% significant level. As used herein, x% significant level means that x% of random 
neighborhoods contain as many genes as the real neighborhood around the class distinction. 
[0065] The general methods for identifying solid tumor disease genes can be used to 

identify genes whose expression levels in the peripheral blood or PBMCs conelate with 
different stages of the development, progression or treatment of solid tumors. Patients can 
be grouped based on their different disease development or treatment stages. The global 
gene expression analysis can be employed to search for genes that are differentially 
expressed in one stage compared to another stage. The genes thus identified can be used as 
markers for monitoring the progression or treatment of solid tumors. 

B. Identification of RCC Disease Genes 

[0066] In one embodiment, HG-U95Av2 gene chips (manufactured by Afifymetrix) 

are used for detecting and comparing the levels of RNA transcripts in PBMOenriched 
peripheral blood samples prepared from RCC patients and disease-free humans. Table 2 
lists examples of qualifiers on a HG-U95Av2 gene chip. Each qualifier represents multiple 
oligonucleotide probes that are stably attached to discrete regions on the gene chip. 
ATTACHMENT A, which is incorporated herein by reference, lists examples of qualifiers 
and their corresponding oligonucleotide probes. Each qualifier in Table 2 corresponds to at 
least one RCC disease gene which is differentially expressed in the peripheral blood of RCC 
patients compared to disease-free humans. In general, the corresponding RCC disease 
gene(s) of a qualifier can hybridize under stringent or nucleic acid array hybridization 
conditions to the oligonucleotide probes listed under the same qualifier in ATTACHMENT 
A. 

[0067] The SEQ ID NO listed under each qualifier in Table 2 depicts a cDNA or 

genomic sequence, or the complement thereof, of the corresponding RCC disease gene(s). 
Fragments of the SEQ ID NO can be used to make oligonucleotide probes for detecting the 
RNA transcripts of the corresponding RCC disease gene(s). ATTACHMENT A includes 
some examples of the oligonucleotide probes thus made. 

[0068] Each SEQ ID NO may have a corresponding Entrez Nucleotide Sequence 

Database accession number. The SEQ ID NOs and their corresponding accession numbers 
are illustrated in Table 3. The Entrez Nucleotide Sequence Database is maintained by the 
National Center of Biotechnology Information (NCBI), National Library of Medicine, 
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Washington, DC, U.S.A. The Database is publicly known and readily accessible. The 
Entrez Nucleotide Sequence Database contains sequence data from GenBank, EMBL and 
DDBJ. The sequence depicted under each SEQ ID NO can be derived from the sequence 
disclosed under the corresponding Entrez accession number. 

[0069] The ambiguous nucleotide residues ("n") in the SEQ ID NOs can be 

determined using methods as appreciated by one of ordinary skill in the art. For instance, 
the ambiguous residues can be determined by aligning the SEQ ID NOs to their 
corresponding genes. The sequences of these genes can be obtained from various human 
genome sequence databases. The ambiguous nucleotide residues can also be determined by 
re-sequencing the corresponding SEQ ID NOs or the sequences under the corresponding 
Entrez accession numbers. Generally, each ambiguous position either represents at least 
one nucleotide selected from a, c, g, or t, or contains no nucleotide residue. 
[0070] Each qualifier has a corresponding classification probe sequence (CPS) 

which is derived from the SEQ ID NO listed under the same qualifier. The corresponding 
CPS consists of at least part of the SEQ ID NO, or the complement thereof. Preferably, 
each CPS does not contain any ambiguous nucleotide residue. More preferably, each CPS 
comprises at least one oligonucleotide probe listed under the corresponding qualifier in 
ATTACHMENT A. Each CPS is capable of hybridizing under stringent or highly stringent 
conditions to the RNA transcripts of the RCC disease gene(s) represented by the 
corresponding qualifier. All of the CPSs listed in Table 2 are collectively referred to as 
"CPS-Table-2". 

[0071] RNA transcripts, such as mRNAs, can be isolated from PBMC-enriched 

peripheral blood samples of RCC patients and diseasehfree humans. cRNAs can then be 
prepared using protocols described in the Affymetrix's Expression Analysis Technical 
Manuals. Subsection G of this specification provides detailed examples for sample 
preparation, HG-U95Av2 genechip hybridization, and subsequent data analysis. 
[0072] A hybridization signal is collected for each oligonucleotide probe on the 

genechip. Signals for oligonucleotide probes with the same qualifier are averaged. 
Qualifiers that produce different hybridization signals in RCC samples relative to disease- 
free samples are identified. Examples of the identified qualifiers are listed in Table 2. 
[0073] Each RCC expression profile in Table 2 ("Averaged Expression Level in 

RCC Patients") is an average of 45 RCC patients, while each expression profile for disease- 
free humans ("Averaged Expression Level in Disease-Free Humans") is an average of 20 
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disease-free humans. The averaged expression level under each qualifier in Table 2 
represents the level of RNA transcripts of the corresponding RCC disease gene(s). The 
ratio of each RCC expression profile over the corresponding disease-free expression profile 
is provided under "Fold Change." The p-value of a Student's Mest (two-tailed distribution, 
two sample unequal variance) for each qualifier is also provided. The p-value suggests the 
statistical significance of the difference between each RCC expression profile and the 
corresponding disease-free expression profile. Lesser p-values indicate more statistical 
significance for the differences observed between RCC patients and disease-free humans. 



Table 2. Comparison of Gene Expression Levels Between RCC Patients and Disease-Free 

Humans 



CPS 
No. 


f"Yl loll Tl At* 

v^/uaiiiier 




Averaged 
Expression 

T in 

RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


/-test 
p-value 


Fold 

Change 
(RCC/ 

Disease- 
Free) 


1 | 


40310_at 


nucleotides 2325 to 
2635ofSEQIDNO: 1 


34.8 


13.8 


4.8E-10 


2.5 


2 


41126_at 


the complement of 
nucleotides 81 to 523 
ofSEQ ID NO: 2 


5.71 


2.7 


1.9E-09 


2.1 


3 


35367_at 


nucleotides 61 to 865 
ofSEQIDNO:3 


107 


51.4 


2.4E-09 


2.1 


4 


41193_at 


nucleotides 2095 to 
2390 ofSEQ ID NO: 4 


26.2 


8.2 


2.7E-09 


3.2 


5 


38829_r_at 


SEQ ID NO: 5 


19.7 


7.9 


5.0E-09 


2.5 


6 


41102_at 


nucleotides 1 144 to 
1607 of SEQ ID NO: 6 


8.44 


1.95 


5.4E-09 


4.3 


7 


40210_at 


nucleotides 616 to 
1159 of SEQ ID NO: 7 


9.89 


4.25 


2.1E-08 


2.3 


8 


37069_at 


nucleotides 847 to 
1236 of SEQ ID NO: 8 


4.64 


2.2 


2.9E-08 


2.1 


9 


39530_at 


nucleotides 1 129 to 
1365 of SEQ ID NO: 9 


8.51 


4.15 


3.0E-08 


2.05 


10 


38739_at 


nucleotides 46637 to 
47224 of SEQ ID NO: 
10 


6.4 


3 


3.5E-08 


2.1 
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CPS 
No. 


Qualifier 




Averaged 
Expression 

T At/PI 1*1 

i^cvci in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


f-test 
p- value 


Fold 
Change 

Disease- 
Free) 


11 


32133_at 


nucleotides *kk>u to 
5038ofSEQ ID NO: 
11 


12.9 


4.45 


3.7E-08 


2.9 


12 


33873_at 


nucleotides vou to 
1324 ofSEQ ID NO: 
12 


15.7 


6.9 


4.5E-08 


2.3 


13 


39854_r_at 


nucleotides 9oo to 
1568 ofSEQ ID NO: 
13 


34.6 


14.05 


5.5E-08 


2.7 


14 


38546_at 


nucleotides 4101 to 
4542 of SEQ ID NO: 
14 


4.4 


2.05 


5.6E-08 


2.1 


15 


1856_at 


nucleotides 1 544 to 
1984 of SEQ ID NO: 
15 


8.47 


3.7 


5.8E-08 


2.3 


16 


36892_at 


nucleotides 3458 to 
4037 of SEQ ID NO: 
16 


4.58 


2.25 


8.4E-08 


2.0 


17 


37152_at 


nucleotides 3047 to 
3258 of SEQ ID NO: 
17 


8.47 


3.5 


9.9E-08 


2.4 


18 


37603_at 


nucleotides 1 184 to 
1653 of SEQ ID NO: 
18 


68.1 


16.6 


1.2E-07 


4.1 


19 


37148_at 


nucleotides 2098 to 
2157 of SEQ ID NO: 
19 


41.2 


18.25 


1.8E-07 


2.3 


20 


34740_at 


SEQ ID NO: 20 


65.1 


22.25 


1.8E-07 


2.9 


21 


37747_at 


nucleotides 127 to 557 
ofSEQIDNO:21 


27.0 


13.15 


2.0E-07 


2.05 


22 


36567_at 


nucleotides 154 to 380 
of SEQ ID NO: 22 


6.02 


2.8 


2.1E-07 


2.15 


23 


38956_at 


niif*1pr»tiHf*^ fS8 R to 

llUvlvUUUvO vOu l\J 

1225 of SEQ ID NO: 
23 


4.56 


2.1 


2.8E-07 


2.2 


24 


32207_at 


nucleotides 1399 to 
1771 of SEQ ID NO: 
24 


64.7 


19.2 


2.9E-07 


3.4 
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CPS 
No. 


ioli no** 

v^Ucumer 




Averaged 
Expression 

T pi;pl jn 
LiCVCl 111 

RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


r-test 
p- value 


Fold 
Change 
mcc/ 
Disease- 
Free) 


25 


36791_g_at 


nucicouaes iuuz to 
1399 of SEQ ID NO: 
25 


7.62 


3.65 


3.OE-07 


2.1 


26 


31684_at 


nucieouues oiz to 
1206 of SEQ ID NO: 
26 


5.73 


2.85 


3.2E-07 


2.0 


27 


1401_g_at 


nucleotides 2634 to 
2981 of SEQ ID NO: 
27 


6.73 


2.3 


3.3E-07 


2.9 


28 


37542_at 


nucleotides 3o7o to 
4193 of SEQ ID NO: 
28 


8.8 


2.35 


3.5E-07 


3.7 


29 


37966_at 


the complement of 
nucleotides 34 to 320 
i ofSEQIDNO:29 


7.29 


3.25 


3.8E-07 


2.2 


30 


38784 _g_at 


nucleotides 1231 to 
1363 of SEQ ID NO: 
30 


7.51 


2.75 


4.1E-07 


2.7 


31 


4033 l_at 


nucleotides 1 1 77 to 
1673 of SEQ ID NO: 
31 


5.29 


2 


4.2E-07 


2.6 


32 


40371_at 


nucleotides 2 1 27 to 
2443 of SEQ ID NO: 
32 


12.0 


3.55 


4.3E-07 


3.4 


33 


32339_at 


the complement of 
nucleotides 9 to 433 of 
SEQ ID NO: 33 


7.67 


3.3 


5.2E-07 


2.3 


34 


34435_at 


nucleotides Z3VU to 
2842 of SEQ ID NO: 
34 


23.4 


9.4 


6.6E-07 


2.5 


35 


37136_at 


nucleotides 1 547 to 
2068 of SEQ ID NO: 
35 


4.78 


2.2 


7.0E-07 


2.2 


36 


37285_at 


nucleotides 1344 to 
1921 of SEQ ID NO: 
36 


370 


54.1 


7.0E-07 


6.8 


37 


37391_at 


nucleotides 1022 to 
1395 of SEQ ID NO: 
37 


136 


38.45 


1.1E-06 


3.5 


38 


35692_at 


nucleotides 557 to 
1078 of SEQ ID NO: 
38 


13.6 


4.6 


1.1E-06 


3.0 
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No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
fn = 20) 


Mest 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


39 


38449_at 


SEQ ID NO: 39 


19.5 


4.9 


1.1E-06 


4.0 


40 


37002_at 


nucleotides 252 to 819 
of SEQ ID NO: 40 


42.2 


11.05 


1.2E-06 


3.8 


41 


1139_at 


IlUClCUlUJCo OIJ IU 

1383 of SEQ ID NO: 
41 


10.8 


4.95 


1.3E-06 


2.2 


42 


1622_at 


nucleotides 1 830 to 
2074 of SEQ ID NO: 
42 


84.2 


39.4 


1.4E-06 


2.1 


43 


32606_at 


nucleotides 12 to 542 
of SEQ ID NO: 43 


15.8 


7.7 


1.4E-06 


2.1 


44 


39436_at 


nucleotides 926 to 
1154 of SEQ ID NO: 
44 


82.3 


24.3 


1.7E-06 


3.4 


45 


40274_at 


nucleotides 561 to 736 
of SEQ ID NO: 45 


8.27 


19.5 


1 .7E-06 


0.42 


46 


37945_at 


nucieoiicies 1 1 /7 10 
1492 of SEQ ID NO: 
46 


8.13 


3.85 


1.9E-06 


2.1 


47 


34255_at 


nucleotides 1417 to 
1798 of SEQ ID NO: 
47 


7.47 


2.85 


2.1E-06 


2.6 


48 


905_at 


nucleotides 268 to 814 
of SEQ ID NO: 48 


103 


45.75 


2.3E-06 


2.3 


49 


1569_r_at 


nucleotides 4183 to 
4257 of SEQ ID NO: 
49 


9.27 


4.45 


2.5E-06 


2.1 


50 


41125_r_at 


SEQ ID NO: 50 


5.2 


2.2 


3.OE-06 


2.4 


51 


35256_at 


IlLU'lCVJLlliCo 1 / OX t\J 

2279 of SEQ ID NO: 
51 


75.9 


28.95 


3.0E-06 


2.6 


52 


290_s_at 


nucleotides 620 to 
1233 of SEQ ID NO: 
52 


9.38 


3.55 


3.2E-06 


2.6 


53 


34666_at 


nucleotides 755 to 
1026 of SEQ ID NO: 
53 


11.3 


4.45 


4.0E-06 


2.5 
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CPS 
No. 


l^u aiiiicr 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n-45) 


Averaged 
Expression 

Level in 
Disease-Free 

Humans 

(n = 20) 


Mest 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


54 


34689_at 


nucleotides 713 to 
1179 of SEQ ID NO: 
54 


9.31 


2.9 


4.0E-06 


3.2 


55 


2090_i_at 


nucleotides 2 to 36 of 
SEQ ID NO: 55 


54.4 


26.2 


4.1E-06 


2.1 


56 


37412_at 


nucleotides 1 3 1 9 to 
1692 of SEQ ID NO: 
56 


8.27 


3.25 


4.1E-06 


2.5 


57 


39799_at 


nucleotides 409 to 662 
of SEQ ID NO: 57 


24.6 


7.2 


4.2E-06 


3.4 ! 


58 


31859_at 


nucleotides 1 /do 10 
2123 of SEQ ID NO: 
58 


6.31 


2.7 


4.6E-06 


2.3 


59 


3766 l_at 


nucleotides *fuoi to 
4398 of SEQ ID NO: 
59 


19.5 


8.35 


4.8E-06 


2.3 


60 


36393_at 


nucieotioes ouo to 
1398 of SEQ ID NO: 
60 


5.69 


2.7 


5.0E-06 


2.1 


61 


39994_at 


nucleotides 1 878 to 
2214ofSEQIDNO: 
61 


10.0 


4 


5.1E-06 


2.5 


62 


35597_at 


nucleotides 282 to 675 
of SEQ ID NO: 62 


5.22 


2.35 


5.3E-06 


2.2 


63 


36780_at 


nucieotioes izjoto 
1651 of SEQ ID NO: 

63 


172 


79.95 


5.7E-06 


2.15 


64 


34476_r_at 


nucieotiues hv/iz to 
4358 of SEQ ID NO: 
64 


11 


3.5 


5.7E-06 


3.1 1 


65 


33862_at 


nucleotides 1 027 to 
1445 of SEQ ID NO: 
65 


3.91 


1.85 


5.7E-06 


2.1 


66 


956_at 


SEQ ID NO: 66 


23.0 


8.7 


5.8E-06 


2.6 


67 


40769_r_at 


nucleotides oU /U to 
6132 of SEQ ID NO: 

! 67 


22.9 


10.35 


6.3E-06 


2.2 


68 


41790_at 


nucleotides 80268 to 
80822 of SEQ ID NO: 


4.2 


1.8 


6.6E-06 


2.3 
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No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 

Humans 

(n = 20) 


/-test 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 






68 










69 


4045 6_at 


■niir»1f»ntiHf*Q tn 

IlU.ClCVJLlvlC'O / JJ IAJ 

1310 of SEQ ID NO: 
69 


11.3 


5.15 


6.8E-06 


2.2 


70 


40647_at 


nucicuLiuca tUZ 1 UJ 

5041 of SEQ ID NO: 
70 


17.4 


5.85 


7.4E-06 


3.0 


71 


31834_r_at 


nucleotides 4249 to 
4499 of SEQ ID NO: 
71 


5.78 


2.85 


7.8E-06 


2.0 


72 


38119_at 


nucleotides 437 to 935 
of SEQ ID NO: 72 


137 


60.95 


8.1E-06 


2.3 


73 


1670_at 


nucleotides 977 to 
1421 of SEQ ID NO: 
73 


3.62 


1.8 


8.1E-06 


2.0 


74 


1649_at 


nucleotides 384 to 651 
of SEQ ID NO: 74 


10.5 


4.4 


8.1E-06 


2.4 


75 


38868_at 


nucleotides 205 to 808 
of SEQ ID NO: 75 


7.82 


3.25 


9.3E-06 


2.4 


76 


37952_at 


nucicoiicies ooja 10 
4432 of SEQ ID NO: 
76 


13.4 


5.25 


1.0E-05 


2.6 


77 


654_at 


! nucicoiicies iy\jj iu 
2355ofSEQIDNO: 
77 


65.4 


21.35 


1.1E-05 


3.1 


78 


39839_at 


nuwieoiiucb iDyo lu 
1568 of SEQ ID NO: 
78 


70.2 


16.3 


1 .2E-05 


4.3 


79 


41743_i_at 


mipl f^rxti HpO 1 ^ m tf> 

2103 of SEQ ID NO: 
79 


10.4 


4.1 


1 .2E-05 


2.5 


80 


37405_at 


nucleotides 1 1 13 to 
1429 of SEQ ID NO: 
80 


140 


20.3 


1 .2E-05 


6.9 


81 


936_s_at 


nucleotides 60 to 556 
of SEQ ID NO: 81 


12.0 


3.95 


1 .3E-05 


3.0 


82 


37323_r_at 


nucleotides 130 to 517 
of SEQ ID NO: 82 


5.09 


2.25 


1 .6E-05 


2.3 
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No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 

Humans 

(n = 20) 


<-test 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


83 


33336_at 


SEQ ID NO: 83 


58.0 


7.75 


1.7E-05 


7.5 


84 


36229_at 


nucleotides 25 1 8 to 
2844 of SEQ ID NO: 
84 


3.84 


1.9 


1.8E-05 


2.0 


87 : 


41442_at 


nucleotides 3614 to 
4179 of SEQ ID NO: 
85 


8.69 


2.55 


2.1E-05 


3.4 


89 


33080_s_at 


nucleotides 5056 to 
5248 of SEQ ID NO: 
86 


170 


51.95 


2.1E-05 


3.3 


90 


34742_at 


nucleotides 774 to 926 
of SEQ ID NO: 87 


14.3 


3.35 


2.2E-05 


4.3 


91 


37026_at 


nucleotides 803 to 
1325 of SEQ ID NO: 
88 


54.3 


24.9 


2.2E-05 


2.2 


92 


34777_at 


nucleotides 901 to 
1449 of SEQ ID NO: 
89 


50.3 


20.15 


2.3E-05 


2.5 


93 


36037_g_at 


nucleotides 6396 to 
6496 of SEQ ID NO: 
90 


13 


2.35 


2.4E-05 


5.5 


94 


40644_g_at 


nucleotides 2734 to 
2853 of SEQ ID NO: 
91 


19.7 


6.35 


2.4E-05 


3.1 


95 


3533 l_at 


nucleotides 2038 to 
2395 of SEQ ID NO: 
92 


5.16 


2.2 


2.6E-05 


2.3 


96 


875_g_at 


nucleotides 562 to 886 
ofSEQIDNO: 93 


98 


14.75 


3.2E-05 


6.6 


97 


35773_i_at 


the complement of 
nucleotides 98 to 398 
of SEQ ID NO: 94 


21.0 


5.7 


3.3E-05 


3.7 


98 


39802_at 


nucleotides 444 to 991 
of SEQ ID NO: 95 


18.9 


5.2 


3.4E-05 


3.6 


99 


37220_at 


nucleotides 1 50 to 425 
ofSEQIDNO: 96 


8.67 


4.05 


3.9E-05 


2.1 


100 


37192_at 


nucleotides 2337 to 
2715 ofSEQIDNO: 
97 


94.8 


23.6 


3.9E-05 


4.0 
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No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


f-test 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


101 


3l610_at 


nucleotides 224 to 512 
of SEQ ID NO: 98 


18.4 


7.95 


3.9E-05 


2.3 


102 


37104_at 


1673 of SEQ ID NO: 
99 


17.4 


2.85 


4.0E-05 


6.1 


103 


38582_at 


IXLC t/UIIipiCIIlCIll Ul 

nucleotides 40 to 288 
of SEQ ID NO: 100 


5.58 


2 


4.1E-05 


2.8 


104 


41l69_at 


nucleotides 890 to 
1006 of SEQ ID NO: 
101 


6.22 


2.25 


4.2E-05 


2.8 


105 


1274_s_at 


nucleotides 741 to 899 
of SEQ ID NO: 102 


20.6 


5.85 


4.3E-05 


3.5 


106 


40177_at 


uic coinpieiiiciii ui 
nucleotides 67 to 276 
of SEQ ID NO: 103 


3.93 


1.85 


4.6E-05 


2.1 


107 


35659_at 


nucieouaes ^uiy it) 
3325 of SEQ ID NO: 
104 


19.2 


40.25 


4.8E-05 


0.48 


108 


35337_at 


nucieoiiQcs i jyo to 
2056 of SEQ ID NO: 
105 


124 


52.85 


4.9E-05 


2.3 


109 


38584_at 


nucleotides 1459 to 
1700 of SEQ ID NO: 
106 


9.18 


4.45 


5.0E-05 


2.1 


110 


1997_s_at 


nucleotides 325 to 388 
of SEQ ID NO: 107 


4.2 


8.65 


5.2E-05 


0.49 


111 


36l62_at 


m i r»1 f^riti H PC 1 C\f\~) tr\ 
I1UL/1CUL1UCS> 1 UUZ, 

1560 of SEQ ID NO: 
108 


37.2 


10.25 


5.2E-05 


3.6 


112 


867_s_at 


rmrleotides 1 820 to 
1945 of SEQ ID NO: 
109 


11.3 


3.9 


5.5E-05 


2.9 | 


113 


38799_at 


nucleotides 2706 to 
2791 of SEQ ID NO: 
110 


7.62 


1.85 


5.6E-05 


4.1 


115 


36628_at 


nucleotides 3321 to 
3804ofSEQIDNO: 
111 


11.1 


5.55 


6.2E-05 


2.0 
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CPS 
No. 


aimer 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
fn = 20) 


r-test 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


116 


34545_at 


nucicOLiuea i iu 
1158ofSEQID NO: 
112 


8.13 


3.65 


6.4E-05 


2.2 


117 


31346_at 


nucieoiiaes oh- / to 
1187 ofSEQID NO: 
113 


6.64 


2.7 


6.4E-05 


2.5 


118 


40926_at 


nucieotiQes i jojo iu 
14081 ofSEQ ID NO: 
114 


18.1 


6.8 


6.5E-05 


2.7 


119 


33803_at 


nucleotides 3479 to 
4005 ofSEQID NO: 
115 


34.5 


16.15 


6.8E-05 


2.1 


120 


296_at 


SEQIDNO: 116 


15.0 


6.55 


6.9E-05 


2.3 


123 


41617_at 


the complement of 
nucleotides 41 to 485 
ofSEQ ID NO: 117 


8.42 


2.8 


8.6E-05 


3.0 


125 


1774_at 


nucleotides 497 to 845 
ofSEQID NO: 118 


5.93 


2.55 


8.8E-05 


2.3 


126 


40990_at 


nucleotides 1 uuo to 
1405 ofSEQID NO: 
119 


8.29 


3.45 


8.8E-05 


2.4 


127 


34798_at 


nucleotides 1 52. to 
1259 ofSEQID NO: 
120 


39.8 


15.55 


8.9E-05 


2.6 


128 


35674_at 


nucleotides o ivo to 
4194 ofSEQ ID NO: 
121 


6.69 


2.9 


9.7E-05 


2.3 


129 


1368_at 


nucleotides 4459 to 
4885 ofSEQID NO: 
122 


14.6 


6.2 


9.8E-05 


2.4 


130 


430_at 


nucleotides 444 to 960 
ofSEQ ID NO: 123 


18 


8.9 


0.00010 


2.0 


131 


39248_at 


the complement of 
nucleotides 55 to 344 
ofSEQ ID NO: 124 


17.8 


47 


0.00010 


0.38 


132 


33932_at 


nucleotides 2013 to 
2558 ofSEQID NO: 
125 


28.4 


10.1 


0.00011 


2.8 
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CPS 
No. 


Oualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


r-test 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


133 


35767_at 


the complement of 
nucleotides 59 to 621 
ofSEO ID NO: 126 


60.1 


27.55 


0.00011 


2.2 


134 


33516_at 


SEQIDNO: 127 


149 


23.2 


0.00011 


6.4 


135 


40120_at 


nucleotides 426 to 948 
of SEQ ID NO: 128 


31.9 


7.5 


0.00011 


4.3 


136 


31380_at 


IlUC'lCUHLiCo JVIJ WJ 

3534ofSEQIDNO: 
129 


10.4 


4.9 1 


0.00012 


2.1 


137 


35379_at 


nucleotides 2491 to 
2893 of SEQ ID NO: 
130 


18.7 


7.4 


0.00013 


2.5 


138 


38138_at 


nucleotides 133 to 574 
of SEQIDNO: 131 


28.6 


12.15 


0.00013 


2.4 


139 


355_s_at 


nucleotides 250 to 850 
of SEQ ID NO: 132 


4.96 


2.3 


0.00013 


2.2 


141 


36045_at 


SEQIDNO: 133 


4.31 


1.8 


0.00014 


2.4 


142 


39145_at 


nUClCOlltlCa U*t / Ikj 

H20 of SEQ ID NO: 
134 


5.98 


1.8 


0.00016 


3.3 


143 


39423_f_at 


IlUVlCULlUCo LJOZr VKJ 

1642 of SEQ ID NO: 
135 


6 


2.95 


0.00017 


2.0 


144 


38598_at 


tVtf* rnmnl f^m fvnt of 

nucleotides 149 to 213 
of SEQIDNO: 136 


8.84 


3.5 


0.00017 


2.5 


145 


33799_at 


nucleotides 1981 to 
2240 of SEQ ID NO: 
137 


29.6 


13.85 


0.00017 


2.1 


146 


34319_at 


nucleotides 39 to 419 
of SEQ ID NO: 138 


22.9 


9.55 


0.00017 


2.4 


147 


36113_s_at 


nucleotides 14630 to 
14687 of SEQ ID NO: 
139 


4.13 


2.05 


0.00019 


2.0 


148 


40848_g_at 


nucleotides 3447 to 
3808 of SEQ ID NO: 
140 


14.6 


2.95 


0.00019 


4.9 
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CPS 
No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression , 

Level in 
Disease-Free 
Humans 
(n = 201 


r-test 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


149 


2094_s_at 


nucleotides 2713 to 
3294ofSEQIDNO: 
141 


66.6 


136 


0.00020 


0.49 


150 


37185_at 


nucleotides 1311 to 
1761 of SEQ ID NO: 
142 


226 


84.55 


0.00020 


2.7 


151 


35714_at 


nucleotides 642 to 960 
of SEQ ID NO: 143 


7.71 


3.2 


0.00021 


2.4 


152 


4095 l_at 


nucleotides 1 860 to 
2099 of SEQ ID NO: 
144 


5.27 


2.3 


0.00022 


2.3 


153 


37187_at 


nucleotides 504 to 946 
of SEQ ID NO: 145 


59.1 


19.55 


0.00023 


3.0 


154 


33506_at 


3121 of SEQ ID NO: 
146 


7.07 


2.2 


0.00023 


3.2 


155 


3443 0_at 


nucleotides 293 1 to 
31 19 of SEQ ID NO: 
147 


12.6 


6.1 


0.00025 


2.1 


156 


40062_s_at 


SEQ ID NO: 148 


9.36 


2.35 


0.00027 


4.0 


157 


37179_at 


nucleotides 1069 to 
1648 of SEQ ID NO: 
149 


10.1 


3.15 


0.00028 


3.2 


158 


1486_at 


nucleotides 145 to 529 
of SEQ ID NO: 150 


5.22 


1.8 


O.O0028 


2.9 


159 


40182_s_at 


nucleotides 1 849 to 

2085 of SEQ ID NO: 
151 


5.73 


2.7 


0.00029 


2.1 


160 


36419_at 


nucleotides 850 to 
1028 of SEQ ID NO: 
152 


! 4.22 


1.8 


O.O0029 


2.3 


161 


32581_at 


SEQ ID NO: 153 


4.24 


2 


O.O0035 


2.1 


162 


31308_at 


nucleotides 36 to 484 
of SEQ ID NO: 154 


4 


1.8 


O.O0039 


2.2 


163 


36871_at 


nucleotides 2087 to 
2652 of SEQ ID NO: 
155 


14.2 


2.55 


O.O0037 


5.5 
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No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


Mest 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


164 


40956_at 


nucleotides 2649 to 
3183 ofSEQ ID NO: 
156 


12.9 


5.25 


0.00038 


2.45 


165 


35151_at 


nucleotides 436 to 895 
of SEQIDNO: 157 


4.18 


1.9 


0.00039 


2.2 


166 


39543_at 


the complement of 
nucleotides 106 to 619 
of SEQIDNO: 158 


7.51 


3.3 


0.00041 


2.3 


167 


725_i_at 


nucleotides 1 844 to 
2146 ofSEQ ID NO: 
159 


11.5 


29.8 


0.00043 


0.39 


168 


31454_f_at 


nucleotides 878 to 972 
of SEQIDNO: 160 


5.6 


2.55 


0.00047 


2.2 


169 


40366_at 


nucleotides 2709 to 
3063 of SEQ ID NO: 
161 


13.5 


4.4 


0.00048 


3.1 


170 


1251_g_at 


nucleotides 3043 to 
3230 of SEQ ID NO: 
162 


8.53 


2.45 


0.00048 


3.5 


171 


115_at 


nucleotides 3083 to 
3605 ofSEQ ID NO: 
163 


42.2 


17.25 


0.00049 


2.4 


172 


34447_at 


nucleotides 2881 to 
3318ofSEQIDNO: 
164 


6.58 


2.35 


0.00050 


2.8 


173 


38879_at 


nucleotides 19 to 325 
of SEQIDNO: 165 


40.0 


17.25 


0.00050 


2.3 


174 


39389_at 


nucleotides 686 to 
1058 of SEQ ID NO: 
166 


14.9 


7.4 


0.00054 


2.0 


175 


39729_at 


nucleotides 712 to 968 
ofSEQ ID NO: 167 


25.4 


8.4 


0.00057 


3.0 


176 


39448_r_at 


nucleotides 46 to 468 
of SEQIDNO: 168 


8.07 


16.45 


O.OO058 


0.49 . 


177 


33759_at 


nucleotides 1090 to 
1582 of SEQ ID NO: 
169 


17.0 


5 


0.00059 


3.4 


178 


33449_at 


nucleotides 893 to 969 
of SEQ ID NO: 170 


10.5 


5 


0.00060 


2.1 
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CPS 
No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


Mest 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


179 


31812_at 


nucleotides 1 047 to 
1464 ofSEQ ID NO: 
171 


32.2 


12.55 


0.00061 


2.6 


180 


40578_s_at 


nucleotides 2081 to 
2425 ofSEQ ID NO: 
172 


12.1 


2.45 


0.00078 


4.9 


181 


40766_at 


SEQ ID NO: 173 


11.4 


4.25 


0.00079 


2.7 


182 


31320_at 


niiplpotiHfQ to 

llUvl&ullUvo U J 1 IU 

1169 of SEQ ID NO: 
174 


3.84 


1.8 


0.00081 


2.1 


183 


34378_at 


nucleotides 1217 to 
1314 of SEQ ID NO: 
175 


102 


28.2 


0.00092 


3.6 


184 


40773_at 


nucleotides 37 to 522 
of SEQ ID NO: 176 


9.56 


3.15 


0.0010 


3.0 


185 


38726_at 


tVif* r*rvrnn1f*mtf*nt nf 

LUC L71C111C111 yJX 

nucleotides 125 to 494 
of SEQ ID NO: 177 


20.8 


3.6 


0.0010 


5.8 


186 


1832_at 


4132 of SEQ ID NO: 
178 


5.00 


2.05 


0.0010 


2.4 


187 


36543_at 


miplpntiHpQ 1 79^ to 

llUt'lCULlUVO 1 / IKs 

2013 of SEQ ID NO: 
179 


6.87 


1.95 


0.0011 


3.5 


188 


137_at 


nucleotides 1 138 to 
1564 of SEQ ID NO: 
180 


6.02 


1.8 


0.0012 


3.3 


189 


38585_at 


SEQ ID NO: 181 


258 


74.25 


| 0.0012 


3.5 


190 


34022_at 


nucleotides 426 to 993 
of SEQ ID NO: 182 


32.2 


4.25 


0.0012 


7.6 


191 


38021_at 


nucleotides 14286 to 
14757 of SEQ ID NO: 
183 


5.67 


2.25 


0.0013 


2.5 


192 


33143_s_at 


nucleotides 1523 to 
1918 of SEQ ID NO: 
184 


18.7 


6.1 


0.0015 


3.1 


194 


40850_at 


nucleotides 1 048 to 
1504ofSEQIDNO: 


16.9 


4.1 


0.0016 


4.1 
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CPS 
No. 


Qualifier 




Averaged 
Expression 
Lycvei in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


r-test 
p-value 


Fold 
Change 

Disease- 
Free) 






185 










195 


36766_at 


nucleotides 167 to 666 
of SEQ ID NO: 186 


24.5 


11.3 


0.0017 


2.2 


196 


38201_at 


nucleotides 836 to 
1155 of SEQ ID NO: 
187 


7.18 


3.05 


0.0018 


2.4 


199 


2092_s_at 


nucleotides 824 to 
1229 of SEQ ID NO: 
188 


9.78 


2.35 


0.0022 


4.2 


201 


408_at 


nucleotides 1 229 to 
1851 of SEQ ID NO: 
189 


21.1 


2.4 


0.0028 


8.8 


202 


36058_at 


nucleotides 1083 to 
1550 of SEQ ID NO: 
190 


29.6 


11.7 


0.0030 


2.5 


205 


38429_at 


nucleotides 7939 to 
8395 of SEQ ID NO: 
192 


5.00 


2.4 


0.0035 


2.1 


206 


502_s_at 


nucleotides 1 959 to 
2156 of SEQ ID NO: 
193 


5.18 


1.85 


0.0041 


2.8 


207 


33802_at 


nucleotides 51072 to 
51587 of SEQ ID NO: 
194 


21.4 


10.25 


0.0047 


2.1 


208 


3801 0_at 


nucleotides 1 044 to 
1494 of SEQ ID NO: 
195 


6.58 


3.25 


0.0050 


2.0 


209 


41046_s_at 


nucleotides 5551 to 
6046 of SEQ ID NO: 
196 


4.76 


2.2 


0.0068 


2.2 


210 


39095_at 


nucleotides 5774 to 
5945 of SEQ ID NO: 
197 


5.87 


1.8 


0.0072 


3.3 


211 


39402_at 


nucleotides 927 to 
1473 of SEQ ID NO: 
198 


7L6 


18.45 


0.0073 


3.9 


212 


37184_at 


nucleotides 1631 to 
2037 of SEQ ID NO: 
199 


6.36 


2.7 


0.0074 


2.4 
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CPS 
No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 
Level in 

DicPQCP-Prpp 
1/loCoov riCv 

Humans 
(n = 20) 


/-test 

yj— ValUC 


Fold 
Change 

(Rca 

Disease- 
Free) 


213 


38273_at 


nucleotides 1251 to 
1576 ofSEQ ID NO: 
200 


6.47 


2.5 


0.0075 


2.6 


214 


35894_at 


nucleotides 1 736 to 
2016ofSEQ ID NO: 
201 


4.67 


1.8 


0.0076 


2.6 


215 


33429_at 


nucleotides 937 to 
1538ofSEQIDNO: 
202 


6.38 


2.6 


0.0083 


2.5 


216 


558_at 


nucleotides 5446 to 
5866 of SEQ ID NO: 
203 


36.8 


11.3 


0.0084 


3.3 


217 


41575_at 


nucleotides 2056 to 
2530 of SEQ ID NO: 
204 


5.09 


2.15 


0.0086 


2.4 


218 


39780_at 


nucleotides 2550 to 
3078 of SEQ ID NO: 
205 


5.2 


2.6 


0.0094 


2 


219 


1257_s_at 


nucleotides 2590 to 
2840 of SEQ ID NO: 
206 


33.6 


14.35 


0.0095 


2.3 


220 


32904_at 


SEQ ID NO: 207 


8.78 


20.85 


0.0096 


0.42 


221 


31499_s_at 


nucleotides 251 to 854 
of SEQ ID NO: 208 


16.0 


6.6 


0.010 


2.4 


222 


1069_at 


nucleotides 8872 to 
9184 of SEQ ID NO: 
209 


7.82 


2.95 


0.011 


2.7 


223 


3941 3_at 


nucleotides 6717 to 
6771 of SEQ ID NO: 
210 


4.91 


1.8 


0.012 


2.7 


224 


3428 l_at 


nucleotides 1 207 to 
1559 of SEQ ID NO: 
211 


9.4 


3.4 


0.012 


2.8 


225 


33914_r_at 


SEQ ID NO: 212 


19.6 


2.15 


0.012 


9.1 


226 


35762_at 


nucleotides 4753 to 
5179 of SEQ ID NO: 
213 


8.89 


2.8 


0.013 


3.2 
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CPS 
No. 


Qualifier 


CPS 


Averaged 
Expression 

Level in 
RCC Patients 
(n-45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans j 
(n = 20) 


r-test 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


227 


- 36372_at 


nucleotides 2437 to 
3029 ofSEQ ID NO: 
214 


6.78 


2.95 


0.013 


2.3 


228 


3245 l_at 


nucleotides 1020 to 
1387 of SEQ ID NO: 
215 


6.31 


1.95 


0.013 


3.2 


229 


40385_at 


nucleotides 207 to 742 
ofSEQIDNO:216 


6.93 


2.35 


0.014 


3.0 


230 


35036_at 


nucleotides 2895 to 
3261 of SEQ ID NO: 
217 


5.4 


2.1 


0.014 


2.6 


231 


34014_f_at 


nucleotides 664 to 
1000 of SEQ ID NO: 
218 


8.38 


2.15 


0.015 


3.9 


232 


37120_at 


nucleotides 1870 to 
2379 of SEQ ID NO: 
219 


12.2 


3.45 


0.016 


3.5 


234 


32054_at 


nucleotides 1916 to 
2038 of SEQ ID NO: 
220 


6.13 


2.3 


0.017 


2.7 


235 


33742_f_at 


nucleotides 248 to 367 
of SEQ ID NO: 221 


8.09 


1.8 


0.019 


4.5 


236 


31719_at 


nucleotides 7039 to 
7633 of SEQ ID NO: 
222 


3.64 


1.8 


0.020 


2.0 


237 


3541 8_at 


nucleotides 471 to 71 4 
of SEQ ID NO: 223 


11.8 


1.85 


0.021 


6.4 


239 


1407 _g_at 


nucleotides 1 768 to 
1958 of SEQ ID NO: 
224 


7.11 


2.95 


0.022 


2.4 


240 


31666_f_at 


nucleotides 62 to 339 
of SEQ ID NO: 225 


13.8 


1.8 


0.024 


7.7 


241 


38299_at 


nucleotides 728 to 
1053 of SEQ ID NO: 
226 


23.9 


3 


0.025 


8.0 


242 


40517_at 


IlUC/lCUllUCb DJLjt. IKJ 

5667 of SEQ ID NO: 
227 


7.84 


3.05 


0.025 


2.6 


243 


1350_at 


nucleotides 2099 to 
2350 of SEQ ID NO: 


7.8 


2.85 


0.026 


2.7 
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No. 


• 

Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


/-test 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 






228 










244 


207_at 


nuwicuiiviCo 1 J1Z LU 

2082 of SEQ ID NO: 
229 


9.07 


3.45 


0.028 


2.6 


245 


39166_s_at 


nucleotides 1583 to 
1790 of SEQ ID NO: 
230 


8.42 


2.75 


0.030 


3.1 


246 


31574_i_at 


nucleotides 39 to 78 of 
SEQ ID NO: 231 


16.8 


1.8 


0.034 


9.3 


247 


40159_r_at 


nucleotides 970 to 
1341 of SEQ ID NO: 

232 


20.2 


8.7 


0.035 


2.3 


248 


33244_at 


SEQ ID NO: 233 


9.29 


3.75 


0.037 


2.5 


249 


2041_i_at 


nucleotides d /jo to 
3773 of SEQ ID NO: 
234 


66.5 


2.35 


0.038 


28 


250 


4063 5_at 


nucieoiiQ.es ihou to 
1771 of SEQ ID NO: 
235 


12.9 


5.5 


0.039 


2.3 


251 


38908_s_at 


nucleotides 2043 to 
2283 of SEQ ID NO: 
236 


20.3 


5.65 


0.039 


3.6 


252 


732_f_at 


SEQ ID NO: 237 


21.4 


8.5 


0.042 


2.5 


253 


32579_at 


nucleotides D\jjy 10 
5246 of SEQ ID NO: 
238 


40.1 


7.75 


0.043 


5.2 


254 


3302 l_at 


•nnrO^ntiH^c 1 'HA A tr\ 
IluV/lvUllUCo X /*t*t 

1878 of SEQ ID NO: 
239 


8.42 


4.2 


0.047 


2.0 


255 


35175_f_ at 


1447 of SEQ ID NO: 
285 


118.47 


191.35 


4.4E-10 


0.62 


256 


32587_at 


nucleotides 4939 to 
5425 of SEQ ID NO: 
286 


61.16 


117.80 


5.2E-10 


0.52 


257 


37337_at 


the complement of 
nucleotides 7 to 362 of 


14.04 


23.55 


5.2E-10 


0.60 
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CPS 
No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
Cn = 20) 


/-test 
p- value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 






SEQ ID NO: 287 










258 


329_s_at 


SEQ ID NO: 288 


8.44 


16.00 


3.0E-10 


0.53 


259 


36589_at 


nucleotides 797 to 
1192 of SEQ ID NO: 
289 


15.78 


23.25 


1.7E-08 


0.68 


260 


33828_at 


SEQ ID NO: 328 


13.07 


20.10 


6.7E-08 


0.65 


261 


41787_at 


the complement of 
nucleotides 77 to 413 
of SEQ ID NO: 291 


6.04 


3.50 


2.1E-08 


1.73 


262 


41220_at 


nucleotides 3638 to 
3874 of SEQ ID NO: 
292 


169.69 


227.65 


3.8E-07 


0.75 


263 


38590_r_at 


nucleotides 575 to 
1111 of SEQ ID NO: 
293 


201.78 


274.50 


1.4E-07 


0.74 


264 


40018_at 


nucleotides 5780 to 
6213 of SEQ ID NO: 
294 


7.84 


4.45 


2.4E-07 


1.76 


265 


39155_at 


nucleotides 1 548 to 
2085 of SEQ ID NO: 
295 


19.22 


25.80 


3.9E-08 


0.75 


266 


37668_at 


nucleotides 600 to 948 
of SEQ ID NO: 296 


10.80 


17.95 


2.9E-11 


0.60 


267 


39136_at 


nucleotides 4031 to 
4415 of SEQ ID NO: 
297 


j 15.33 


10.55 


3.7E-06 


1.45 


268 


1125_s_at 


nucleotides 43 to 226 
of SEQ ID NO: 298 


8.42 


4.50 


5.7E-08 


1.87 


269 


1211_s_at 


nucleotides 972 to 
1076 of SEQ ID NO: 
299 


7.02 


3.80 


4.5E-07 


1.85 


270 


1445_at 


nucieotiaes ivy / to 
1643 of SEQ ID NO: 
300 


6.47 


3.55 


3.6E-07 


1.82 


271 


32405_at 


nucleotides 5804 to 
6242 of SEQ ID NO: 
301 


7.69 


4.50 


2.9E-07 


1.71 
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CPS 
No. 


Qualifier 


CPS 


Averaged 
Expression 
Level in 
RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 
Humans 
(n = 20) 


r-test 
p-value 


Fold 
Change 
(RCC/ 
Disease- 
Free) 


272 


32635_at 


nucleotides 3240 to 
3424 ofSEQ ID NO: 
302 


8.00 


4.65 


6.9E-05 


1.72 


273 


3633 l_at 


nucleotides 2550 to 
3110ofSEQIDNO: 
303 


6.42 


3.30 


7.2E-07 


1.95 


274 


37788_at 


nucleotides 1293 -1655 
of SEQIDNO: 304 


4.62 


2.35 


1.2E-05 


1.97 


275 


38228 _g_at 


nucleotides 1 878 to 
2045 ofSEQ ID NO: 
305 


6.53 


4.25 


5.4E-05 


1.54 


276 


39708_at 


SEQ ID NO: 306 


32.13 


19.65 


9.5E-08 


1.64 


277 


40076_at 


niiplf^ntirlpQ 1 683 to 

2285 of SEQ ID NO: 
307 


59.36 


35.35 


2.5E-07 


1.68 


278 


40177_at 


fVifk r»r\ TYTnl fMTl PTlt 0"F 
Li.ll/ V^vllllJlv/lllV/lli. xsi- 

nucleotides 67 to 276 
of SEQ ID NO: 308 


3.93 


1.85 


4.6E-05 


2.13 


279 


1891_at 


rmrlpfvHHp^ '7144- to 

2738 of SEQ ID NO: 
309 


9.16 


4.65 


1.3E-08 


1.97 


280 


31536_at 


rmrleotide^ ^410 to 

4018 ofSEQ ID NO: 
310 


25.56 


15.75 


1.2E-08 


1.62 


281 


32719_at 


nucleotides 1261 to 
1780 of SEQ ID NO: 
311 


7.16 


4.05 


9.6E-08 


1.77 


282 


33371_s_at 


nucleotides 420 to 879 
of SEQ ID NO: 312 


21.31 


11.05 


8.6E-09 


1.93 


283 


35434_at 


nucleotides 1591 to 
1897 of SEQ ID NO: 
313 


12.62 


7.25 


1.7E-08 


1.74 


284 


40167_s_at 


nucleotides 1405 to 
1643 of SEQ ID NO: 
314 


9.11 


6.45 


3.3E-06 


1.41 


285 


649_s_at 


nucleotides 1038 to 
1632 of SEQ ID NO: 
317 


172.87 


266.70 


3.1E-06 


0.65 
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CPS 
No. 


/"V. . _ 1 * £t 

Qualiner 




Averaged 
Expression 

-Level in 

RCC Patients 
(n = 45) 


Averaged 
Expression 

Level in 
Disease-Free 

Humans 

(n = 20) 


Mest 
p- value 


Fold 
Change 

Disease- 
Free) 


286 


31492_at 


nucleotides 255 to 758 
ofSEQ ID NO: 318 


47.91 


64.10 


9.7E-09 


0.75 


287 


31955_at 


nucleotides 1 to 475 of 
SEQIDNO: 319 


316.33 


435.15 


1.4E-08 


0.73 


288 


35125_at 


SEQ ID NO: 330 


404.47 


547.05 


5.1E-07 


0.74 


289 


36463_at 


nucleotides 3746 to 
41 19 of SEQ ID NO: 
321 


13.49 


20.05 


1.7E-09 


0.67 


290 


36786_at 


SEQ ID NO: 329 


204.07 


304.40 


1.1E-09 


0.67 


291 


38269_at 


nucleotides 1235 to 
1699 of SEQ ID NO: 
323 


27.64 


40.25 


3.9E-07 


0.69 


292 


38527_at 


nucleotides 2145 to 
2484 of SEQ ID NO: 
324 


53.49 


70.70 


6.8E-09 


0.76 


293 


40610_at 


SEQIDNO: 331 


12.56 


20.50 


2.7E-06 


0.61 


294 


41506_at 


nucleotides 1440 to 
1952 of SEQ ID NO: 
326 


8.11 


13.45 


2.7E-07 


0.60 


295 


41604_at 


nucleotides 1095 to 
1400 of SEQ ID NO: 
327 


\ 13.60 


21.30 


3.5E-07 


0.64 



Table 3. SEP ID NOs and the Corresponding Entrez Accession Numbers 



SEQ ID 
NO 


Corresponding 
Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 


1 


AF051152 


Homo sapiens Toll/interleukin-1 receptor-like protein 
4 (TIL4) mRNA 


2 


AA978353 




3 


AB006780 


Homo sapiens mRNA for galectin-3 


4 


AB013382 


Homo sapiens mRNA for DUSP6 


6 


U66359 


Human T54 protein (T54) mRNA 


7 


X75593 


Homo sapiens mRNA for rab 1 3 


8 


X91348 


Homo sapiens predicted non coding cDNA (DGCR5) 
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SEQ ID 
NO 


Corresponding 
Accession No. 


T?pnr»rtpH Snurrf* of the Oorre^nondint? Fntrez Seauenrp 


9 


L35240 


Human enigma gene 


10 


AFO 17257 


Homo sapiens chromosome 21 derived BAC 
containing erythroblastosis virus oncogene homolog 2 
protein (ets-2) gene 


11 


AB011161 


Homo sapiens mRNA for KIAA0589 protein 


! 12 


D43642 


Human YL-1 mRNA for YL-1 protein (nuclear 
protein with DNA-binding ability) 


13 


AF055000 


Homo sapiens clone 24519 unknown mRNA 


14 


AB006537 


Homo sapiens mRNA for interleukin 1 receptor 
accessory protein 


15 


X75042 


Homo sapiens rel proto-oncogene mRNA 


16 


AF032108 


Homo sapiens integrin alpha-7 mRNA 


17 


L07592 


1-Tnman "neroxisrmne rjroliferator activated recentor 
mRNA 


18 


X52015 


Homo sapiens mRNA for interleukin- 1 receptor 
antagonist 


19 


AF025533 


Homo sapiens leukocyte immunoglobulin-like 
receptor-3 (LIR-3) mRNA 


21 


U05770 


Human annexin V (ANX5) gene, exon 13 


22 


W26700 




23 


AF052111 


Homo sapiens clone 23953 mRNA sequence 


24 


M64925 


Human palmitoylated erythrocyte membrane protein 
(MPPl)mRNA 


25 


Ml 9267 


Human tropomyosin mRNA 


26 


M62896 


Human lipocortin (LIP) 2 pseudogene mRNA 


27 


M13207 


Human granulocyte-macrophage colony-stimulating 
factor (CSF1) gene 


28 


D86961 


Human mRNA for KIAA0206 gene 


29 


AA1 87563 




30 


J05581 


Human polymorphic epithelial mucin (PEM) mRNA 


31 


AF035819 


Homo sapiens macrophage receptor MARCO mRNA 


32 


X51362 


Human mRNA for dopamine D2 receptor 


33 


AA844998 




34 


AB008775 


Homo sapiens AQP9 mRNA for aquaporin 9 


35 


AB000520 


Homo sapiens mRNA for APS 


36 


X60364 


Human ALAS mRNA for 5-aminolevulinate synthase 

precursor 


37 


X12451 


Human mRNA for pro-cathepsin L (major excreted 
protein MEP) 


38 


AL080235 


Homo sapiens mRNA; cDNA DKFZp586E1621 
(from clone DKFZp586E1621) 


40 


D32143 


Human mRNA for biliverdin-IXbeta reductase I 


41 


L22075 


Homo sapiens guanine nucleotide regulatory protein 
(G13) mRNA 
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SEQ ID 
NO 


Corresponding 
Entrez Database 


rCeporteo oource oi ine i^orrcsponuing nnircz. oequence 


i ^z 


D87116 


Human mRNA for MAP kinase kinase 3b 


ASK 


AA135683 




44 


AF079221 


Homo saniens BCL2/adeno virus E1B 19kDa- 
interacting protein 3 a mRNA 




T J48213 


Human D-site binding nrotein eene. exon 4 




TT01 ^1 6 


T-Tnrrmn anvl-C'oA thi nest er hvdrolase mRNA 


47 


AF059202 


Homo sapiens ACAT related gene product 1 mRNA 


48 ! 




xiuman guanyiate Kirid.be ^uujm j iiiivi^^ 


49 


L42243 


Homo sapiens (clone 51H8) alternatively spliced 
lnterteron receptor (ir jnakzj gene, exon y 


50 


D45421 


Human mRNA for phosphodiesterase I alpha 


51 


AL096737 


Homo sapiens mRNA; cJJJNA L)KrZp434r i (irom 
clone DKFZp434F 152) 


52 


L32831 


Homo sapiens G protein-coupled receptor (GPR3) 

gene 


53 


X07834 


Human mRNA for manganese superoxide dismutase 
(EC 1.15.1.1) 


54 


AJ243797 


Homo sapiens mRNA for deoxynbonuclease 111 (dm3 

gene) 


55 


H12458 




56 


S78798 


: : : r — 

l-phosphatidylinositol-4-phosphate 5-kinase lsoform 

C [human, 

peripheral Dlooa leuKocyies, mivLN/\, iojj ntj 


57 


M94856 


riuman tatty acid Dinuing protein nomoiogue ^r/v- 


CO 


TPKfT7n 
JyJDKJ f\J 


riUllldll tjpC 1 v lAjiia-gv/iia-ow; iixj.nj.^^*. 


59 


J04027 


Un-m on -nlacmji m pmhranp f^'a'7+ niimTYin*? ATPase 

XlUlIlCtil LJlCtollld illwlilL/l ClllV/ V_sCL£« 1 L4J.1 A J-/AAA^ fill OOW 

mRNA 


60 


U43843 


Human h-neuro-d4 protein mRNA 


Ol 




Human mRNA for HM1 45 


oz 


AJUUU'tOU 


Unmo Qa-nif^n^ mRNA for CRPW ohosnrionrotein 






TTnman mmnlpmpnt cvtolv^i^ inhibitor ( CLI^ mRNA 


64 


D30783 


Homo sapiens mRNA for epiregulin 


65 


AF017786 


Homo sapiens phosphatidic acid phosphohydrolase 
homolog (Dri42) mRNA 


66 


X79535 


Homo sapiens mRNA for beta tubulin, clone nuk 278 


67 


D14689 


! Human mRNA for KIAA0023 gene 


68 


AL031230 


Human DNA sequence from clone 73M23 on 
enromosome opzz.z-zz.j, contains tne d pan oi ine 

possibly alternatively spliced gene for 
Phosphatidylinositol-glycan-specific Phospholipase D 
1 precursor (EC 3.1.4.50, PIGPLD1, Glycoprotein 
Phospholipase D, Glycosyl-Phosphatidylinositol 
specific Phospholipase D), the gene for NAD+- 
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SEQ ID 
NO 


Corresponding 
Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 






dependent succinic semialdehyde dehydrogenase 

CCQAnH 10 1 0A\ anH tV»p V nart r»f the 

KI AA03 1 9 gene; contains ESTs, 
QTQc frSS<3 and a rmtative CnG island comnlete 

kj X Oo^ VJOOo CU1U C* L/ 141 CALX ▼ V/ V/|/vJ ifcjiiuivij vvn^/iviv 

sequence 


69 


AL049963 


Homo sapiens mRNA; cDNA DKFZp564A132 (from 
clone DKFZp564A132) 


70 


Z32684 


Homo sapiens mRNA for membrane transport protein 

(XK gene) 


1 1 
/ 1 




Homo saniens mRNA for KIAA0837 protein 


72 


XI 2496 


Human mRNA for erythrocyte membrane 
sialoel vconrotein beta ( glvconhorin C) 


73 


L23959 


T-Jrvmn QarvipnQ F9F-re1ated trariscrintion factor 
(DP- 1) mRNA 




T 161 8^6 


Human nutative cvclin Gl interacting protein mRNA 


75 


U43774 


Human Fc alpha receptor, splice variant FcalphaR a. 2 
(CD89) mRNA 


1 \j 


M35999 


Human platelet glycoprotein Ilia (GPIIIa) mRNA 


n 


L07648 


Human MXI1 mRNA 


TO 




Unman FiM A-HinHincr nrotein A fdboA^ pene 3 f end 


79 


AF061034 


Homo sapiens FIP2 alternatively translated mRNA 


80 


U29091 


riomo sapiens bcicniuiii-uiiiLiiiig piutcni v noij.r j 
mRNA 


81 


U68111 


Human protein phosphatase inhibitor 2 (PPP1R2) 

trpnp pxon fi 


82 


X82460 


Hnmn ^aniens mRNA for 1 5-hvdroxv prostaglandin 
dehydrogenase 


84 


U58917 


Homo sapiens IL-17 receptor mRNA 


85 


AB010419 


Homo saniens mRNA for MTG8-related protein 
MTG16a 


86 


AB007943 


Homo sapiens mRNA for KIAA0474 protein 


R7 
o / 


Z23115 


Homo sapiens bcl-xL mRNA 


88 


AF001461 


Homo sapiens Kruppel-like zinc finger protein Zf9 

mRNA 


89 


D14874 


Homo sapiens mRNA for adrenomedullin precursor 


90 


J05500 


Human beta-spectrin (SPTB) mRNA 


01 


M34480 


Human platelet glycoprotein lib (GPIIb) mRNA 


92 


U97067 


Homo sapiens alpha-catenin-like protein mRNA 


93 


M26683 


Human interferon gamma treatment inducible mRNA 


94 


AA527880 




95 


X72308 


Homo sapiens mRNA for monocyte chemotactic 
protein-3 (MCP-3) 


96 


M63835 


Human IgG Fc receptor I gene, exon 6 
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SEQ ID 
NO 


dorresoondin 2 
Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 


97 


U28389 


Human dematin 52 kDa subunit mRNA 


98 


U21049 


Homo sapiens DD96 mRNA 


99 


L40904 


Homo sapiens peroxisome proliferator activated 
receptor gamma 
(PPARG) mRNA 


100 


AI961220 




101 




Homo sapiens mRNA for urokinase plasminogen 
activator receptor 


102 


L22005 


Human ubiquitin conjugating enzyme mRNA 


103 


AI732885 




104 


U00672 


Human interleukin-1 0 receptor mRNA 


105 


AL050254 


Novel human gene mapping to chomosome 22 


106 


AF026939 


Homo sapiens CIG49 (cig49) mRNA 


! 107 


U19599 


Human (B AX delta) mRNA 


108 


X64364 


Homo sapiens mRNA for M6 antigen 


109 


U12471 


Human thrombospondin-1 gene 


110 


AF068706 


Homo sapiens gamma2-adaptin (G2AD) mRNA 


111 


L42542 


Human RLIP76 protein mRNA 


112 


AF070587 


Homo sapiens clone 24741 mRNA sequence 


113 


AJ001481 


Homo sapiens mRNA for DUX1 protein 


114 


U36341 


Human Xq28 cosmid, creatine transporter (SLC6A8) 
gene, complete 
cds, and CDM gene, partial cds 


115 


J02973 


Human thrombomodulin gene 


116 


AF141349 


Homo sapiens beta-tubulin mRNA 


117 


AI349593 




118 


L06895 


Homo sapiens antagonizer of myc transcriptional 
activity (Mad) mRNA 


119 


AF065389 


Homo sapiens tetraspan NET-4 mRNA 


120 


Z35491 


Homo sapiens mRNA for novel glucocorticoid 
receptor-associated protein 


121 


AB023211 


Homo sapiens mRNA for KIAA0994 protein 


122 


M27492 


Human interleukin 1 receptor mRNA 


123 


J\X)K) 1 D 1 


Human mRNA for purine nucleoside phosphorylase 
(PNP; EC 2.4.2.1) 


124 


N74607 




125 


XI 7644 


Human GSTl-Hs mRNA for GTP-binding protein 


126 


AI565760 




128 


X90999 


Homo sapiens mRNA for Glyoxalase II 


129 


AF059198 


Homo sapiens protein kinase/endoribonulcease 
(IREl)mRNA 


130 


X54412 


Human mRNA for alphal(IX) collagen (long form) 


131 


D38583 


Human mRNA for calgizzarin 
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SEQ ID 
NO 


Corresponding 
Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 


1 ^9 


D38037 


Human mRNA for FK506-binding protein 12kDa 
(hFKBP-12) homologue 


134 


J02854 


Human 20-kDa myosin light chain (MLC-2) mRNA 


135 


AJ000644 


Homo sapiens mRNA for SPOP 


136 


AI679353 




137 


U76248 


Human hSIAH2 mRNA 


138 


AA131149 




139 


AJU1 1 / IZ 


Homo sapiens TNNT1 gene, exons 1-11 (and joined 

CDS) 


140 


AB018293 


Homo sapiens mRNA for KIAA0750 protein 


I4l 


K00650 


Human fos proto-oncogene (c-fos) 


142 


YUUoiU 


Human mRNA for Arg-Serpin (plasminogen 
activator-inhibitor 2, P AI-2) 


143 


U89606 


Human pyridoxal kinase mRNA 


144 


A T A/1 no Crt 


Homo sapiens mRNA; cDNA DKFZp564Dl 13 (from 
clone DKFZp564D113) 


145 


M36820 


Human cytokine (GRO-beta) mRNA 


146 


U96919 


Homo sapiens inositol polyphosphate 4-phosphatase 
type I-beta mRNA 


147 


U70732 


Human glutamate pyruvate transaminase (GPT) gene 


149 


S77763 


nuclear factor erythroid 2 isoform f, basic leucine 
zipper protein {alternatively spliced, exon If) 
[Tiuman, fetal liver, mRNA, 1678 nt] 


150 


L37127 


Homo sapiens RNA polymerase II mRNA 


I5l 


AF055027 


Homo sapiens clone 24658 mRNA sequence 


152 


AF038171 


Homo sapiens clone 23671 mRNA sequence 


154 


L17330 


Human pre-T/NK cell associated protein (6H9A) 
mRNA 


155 


M60298 


Human erythrocyte membrane protein band 4.2 
(EPB42) mRNA 


156 


AyUoD / 


Homo sapiens mRNA for -14 gene, containing globin 
regulatory element 


ID / 


r\T \JOyO IH 


Homo sapiens growth suppressor related (DOC-1R) 

mRNA 


158 


AI077476 




159 


K02401 


Human chorionic somatomammotropin gene hCS-1 


160 


AF034209 


Homo sapiens RIG-like 5-6 mRNA 


161 


M25322 


Human granule membrane protein- 140 mRNA 


162 


M64788 


Human GTPase activating protein (rap 1 GAP) mRNA 


163 


X14787 


Human mRNA for thrombospondin 


164 


U62433 


Human nicotinic acetylcholine receptor alpha4 
subunit precursor, mRNA 


165 


D83664 


Human mRNA for CAAF1 (calcium-binding protein 
in amniotic fluid 1) 
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NO 


Corresponding 
Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 


166 


M38690 i 


Human CD9 antigen mRNA 


167 


L19185 


Human natural killer cell enhancing factor (NKEFB) 

mRNA 


168 


W27095 




169 


X04327 


*W T .1 f^. 1*1 11 A A. 

Human erythrocyte 2,3-bisphosphoglycerate mutase 
mRNA EC 2.7.5.4 


170 


AF054185 


Homo sapiens proteasome subunit HSPC mRNA 


171 


M24470 


Human glucose-6-phosphate dehydrogenase 


172 


M77016 


T T a. J 1 * T~» ~X. T A 

Human tropomoduhn mRNA 


174 


U18548 


Human GPR12 G protein coupled-receptor gene 


175 


X97324 


Homo sapiens mRNA for adipophilin 


176 


L03785 


Human regulatory myosin light chain (MYL5) mRNA 


177 


W80399 




178 


M62397 


colorectal mutant cancer protein mRNA 


179 


J02931 


Human placental tissue factor (two forms) mRNA 


180 


U65404 


Human erythroid-specific transcription factor EKLF 

mRNA 


182 


M36821 


Human cytokine (GRO-gamma) mRNA 


183 


U53204 


Human plectin (PLEC1) mRNA 


184 


TT81 ft 00 
uoi o\j\j 


Homo sapiens monocarboxylate transporter (MCT3) 

mRNA 


185 


L37033 


Human FK-506 binding protein homologue (FKBP38) 

mRNA 


186 


X55988 


Human EDN mRNA for eosinophil derived 
neurotoxin 


187 


U21551 


Human EC A3 9 mRNA 


188 


J04765 


Human osteopontin mRNA 


189 


X54489 


Human gene for melanoma growth stimulatory 
activity (MGSA) 


190 


AL096741 


Homo sapiens mRNA; cDNA DKFZp586O0223 
(from clone DKFZp586O0223) 


192 


U29344 


Human breast carcinoma fatty acid synthase mRNA 


193 


U37431 


Human HOXA1 mRNA, long transcript and 
alternatively spliced forms 


194 


Z82244 


Human DNA sequence from clone CTA-286B10 on 
chromosome 22; contains the 3 1 end of the TOM1 

gene for target of mybl (chicken) homolog, the 
HMOX1 gene for Heme Oxygenase (decycling) 1 
! (HO-1, EC 1.14.99.3), the MCM5 gene for 
| minichromosome maintenance deficient (S. 
cerevisiae) 5 (cell division cycle 46, DNA Replication 
Licensing Factor, PI -CDC46), ESTs, STSs, GSSs, 
and two putative CpG islands 


195 


AF002697 


Homo sapiens E1B 19K/Bcl-2-binding protein Nip3 
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SEQID 
NO 


V/Ul I GoJJUllllli-Lg 

Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 






mRNA, nuclear gene encoding mitochondrial protein 


196 


X95808 


Homo sapiens mRNA for protein encoded by a 
candidate gene, DXS6673E, for mental retardation 


197 


M58018 


Homo sapiens beta-myosin heavy chain (MYH7) 
mRNA 


198 


M15330 


Human interleukin 1-beta (IL1B) mRNA 


199 


L37792 


Homo sapiens syntaxin 1A mRNA 


200 


AJ006268 


Homo sapiens mRNA for putative ATPase 


201 


X14362 


Human CR1 mRNA for C3b/C4b receptor secreted 

form 


202 


AL050225 


Homo sapiens mRNA; cDNA DKFZp586M1523 
(from clone DKFZp586M1523) 


203 


M98776 


Human keratin 1 gene 


204 


AF070571 


Homo sapiens clone 24739 mRNA sequence 


205 


M29551 


Human calcineurin A2 mRNA 


206 


L42379 


Homo sapiens bone-derived growth factor (BPGF-1) 

mRNA 


208 


X16863 


Human Fc-gamma RIII-1 cDNA for Fc-gamma 
receptor III- 1 (CD 16) 


' 209 


U04636 


Human cyclooxygenase-2 (hCox-2) gene 


210 


AJ001189 


Homo sapiens mRNA for oligophrenin 1 


211 


AF039555 


Homo sapiens visinin-like protein 1 (VSNL1) mRNA 


213 


AB007952 


Homo sapiens mRNA for KIAA0483 protein 


214 


U51333 


Human hexokinase III (HK3) mRNA 


215 


L35848 


Homo sapiens IgE receptor beta chain (HTm4) 
mRNA 


216 


U64197 


Homo sapiens chemokine exodus- 1 mRNA 


217 


U94333 


Human Clq/MBL/SPA receptor ClqR(p) mRNA 


218 


D10216 


Human mRNA for Pit-l/GHF-1 


219 


X91817 


Homo sapiens mRNA for transketolase-like protein 
(2418 bp) 


220 


AF048732 


Homo sapiens cyclin T2b mRNA 


221 


W27838 




222 


X02761 


Human mRNA for fibronectin (FN precursor) 


223 


J04178 


Human abnormal beta-hexosaminidase alpha chain 
(HEXA) mRNA, chromosome 15q23-q24 


224 


M21985 


Human steroid receptor TR2 mRNA 


225 


W28731 




226 


X04430 


Human IFN-beta 2a mRNA for interferon-beta-2 


227 


AB002370 


Human mRNA for KIAA0372 gene 


228 


U02388 


Homo sapiens cytochrome P450 4F2 (CYP4F2) ' 
mRNA 


229 


M86752 


Human transformation-sensitive protein (IEF SSP 
3521) mRNA 
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SEQED 
NO 


Corresponding 
Entrez Database 
Accession No. 


Reported Source of the Corresponding Entrez Sequence 


010 
ZJU 


T\Q'3 1 HA 

Doj 1/4 


T_Ji on mP\I A "f/%t* r»r*11oo^n ninnmo rvrAfpin 9 

riuiiiaii nxiviN/v ior cuiiagcii uiiiunig piuiciii 


231 


Ml 4087 


^ urn an rii^iH gene encoaing Deia-gaiaciuoiuc-uiiiuiiig 
lectin, 3 f end, clone 2 


232 


M55067 


riunian *f /-jcu autosomal enrome granuioiiiaujuo 

UloCaoC JJ14JlCl.il J. ill VI > XX 


OIA 


IYIIH/jZ 


T-Tt ity"i an nl fy^Tlf* 
Xl UillClll L dUl gdlC 






nouio oapieiio iiuiiiim~i iiiivin^v 


236 


AL096744 


riomo sapiens itixvln/v, cj_jin/\ i^ivx z^p juujiw j j ^iruiii 
clone DKFZn566H033 1 


9^7 

ZJ> / 


IVlJjtvU 


Human intestinal mucin fMTJd-^i mRNA 

llUlllCUl lllLWoLillOl lllUVslll ^1V1\J -J J llllVl^ii. 


9^R 


TT901 7S 

UL7 1 / .J 


T-Tnman tranQcrintional activator (HRRGl^ mRNA 

11 Lillian Li cuiovi ityiivsiiai avuvaiui ^jjivvj i j iinvun. 


93Q 


/vr vjo j 3 1 *t 


Rrimn QariieriQ clnnf* 9^6S1 mRNA sennence 

XJ.U111U oapiwiio W'HJiit' t<jvJ i luiviin o v^j uvi jw 


9RS 




14 Qanipns mRNA for plonpation factor 1 alnlia-2 


zoo 


Uu / OUZ 


Uumflti Ti^l 1 H cpnf* 

1 111111011 1 lol 1 Kl gvllb 


987 






9RR 
Zoo 


71 1 

Zj 1 1 jOt 1 


T-Tnmr» Qar^ipnQ mRNA "for NiiN/T A tvrotein 

1 1VJ111KJ odJJlV^lld llUVllxV 1\J1 liUlVLrl piviviii 


9RQ 


VI 4 
Al JHl*t 


TTnman mRNA fixr £ilHnQF* rpHnctasp (T*{~* 1119 

Xl CilllCUl IllXVlN/Tk. 1U1 CllUUoC ivUUULdoC ^l^V-x 1 . -1 . 1 


900 


AP0^^9^9 


riuiiiu octpiciio jj^vr j / ^aj.tvi j i j fttut 


901 


AT4^9449 

/VI *+ JZH-HZ 




909 
z^/z 


AR09390R 
/VDUZjZUO 


Wnmn ^amVnQ mRNA for ICT A A0001 nrotein 

riUllHJ oCtpiCllo IIIIVIN^V IU1 IVl/VrVUy 7 1 JJ1ULL1U 


293 


M14630 


Human prothymosin alpha mRNA 


OH/1 

zy4 


A TQ Am07A 

A±>UU /o /U 


U At « ^-x cdnianc FT A A O A 1 O mPM A 

nomo sapiens isjl/\/\u^+ i u miviN/\ 


295 


D67025 


Homo sapiens rnRNA for proteasome subunit p58 


296 


M69039 


Human pre-mRNA splicing factor SF2p32 


297 


AB 017642 


Homo sapiens mRNA for oxidative-stress responsive 1 


298 


L05424 


Human cell surface glycoprotein CD44 (CD44) gene, 

CaOII 1H- 


299 


U84388 


riuman aeain aornain conLdining proiem s^sms^tmjlj 
mRNA 


300 


AF014958 


Homo sapiens chemokine receptor X (CKRX) mRNA 




A T5H1 AA07 


nomo Sapiens miviN/\ ior rsA/\j-\\j / kj / pivjiciu 


^00 


a Pinoon^A 
/vtjuzyujo 


T-!r*TYir\ coniptic mRNA ~Pr\r T^TA A 1 1 1 ^ nrntpltl 

nomo Sapienb iiiivin/\ ior ivi/vrvj. 1 1 j piuLciii 




at n^oi 1 o 


Unmn ennipn? mRNA* rTYMA DKF7nS86C0Q1 

llUIIlU oapiCIlSi HllVl>J/A, C/J-J1> -rV UI\X£j[JJOUV/U71 | 




A T7n^9 11^ 


liUIIlU od.pi Clio UlUllC ljUOO llirviN Jr\. scijuciitc 


305 


AB006909 


Ur>tn n ciirtipnc mRNA ~fr%r A-t\/r>p micronntrialmia 
rivjllHJ oapiciia iiiXvixyTk lui rv-iy iiii^/iupiiuiaiiixia 

associated transcription factor 


^07 


aptoozli^o 


T4r*mn ennipne ViTjS4-4-inQ9 i Qnfnrm rhFiS4 i mRNA 
llUlllU oapiCllo ILL/ J*t~lllo^. lolJlUllll ^lifJ i ) iiiivi^r^. 


308 


AI732885 




309 


D14497 


Human mRNA for proto-oncogene protein 


310 


AB02O693 


Homo sapiens mRNA for KIAA0886 protein 


311 


L41827 


Homo sapiens sensory and motor neuron derived 
factor (SMDF) mRNA, 


312 


U59877 


Human low-Mr GTP-binding protein (RAB31) 
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SEQID 
NO 


Corresponding 
Fntrez Database 
Accession No. 


Renorted Source of the Corresoondine Entrez Seauence 






mRNA 


313 


LI 6794 


Human transcription factor (MEF2) mRNA 


314 


AP038187 


Homo sapiens clone 23714 mRNA sequence 


315 


L29277 


Homo sapiens DNA-binding protein (APRF) mRNA 


317 


L06797 


Human (clone L5) orphan G protein-coupled receptor 

mRNA 


! 318 


AB019392 


Homo sapiens mRNA of muscle specific gene M9 


319 


X65923 


Homo sapiens fau mRNA 


320 


X67309 


Homo sapiens gene for ribosomal protein S6 


321 


AB020680 


Homo sapiens mRNA for KIAA0873 protein 


322 


AL022721 


Human DNA sequence from clone 109F14 on 
chromosome 6p21. 2-21.3, which contains the 
alternatively spliced gene for Transcriptional 
Enhancer Factor TEF-5, the 60S Ribosomal Protein 
RPL10A gene, a putative ZNF127 LIKE gene, and the 
PPARD for Peroxisome Proliferator Activated 
Receptor Delta (PPAR-Delta, PPAR-Beta, Nuclear 
xiormone Keceptor 1, JN uli, iNUd, rrAKDj. it also 
contains three putative CpG islands, ESTs, STSs, 
GSSs and a ca repeat polymorphism. 


323 


AL050147 


Homo sapiens mRNA; cDNA DKFZp586E0820 
(trom clone jjjvrZipDoOJiuozuj 


324 


U02493 


Human 54 kDa protein mRNA 


325 


AI743507 


wf72a06.x2 Soares_NFL_T_GBC_S I Homo sapiens 

<-TYW A rlrmf l"N/f 1 0^ *V similar tn 

TR:088532 088532 ZINC FINGER RNA BINDING 
PROTEIN 


326 


AF032437 


Homo sapiens mitogen activated protein kinase 
activated protein kinase gene 


327 


U79297 


Human clone 23589 mRNA sequence 



[0074] Each qualifier in Table 2 represents at least one RCC disease gene which is 

differentially expressed in the peripheral blood of RCC patients relative to disease- free 
humans. The RNA transcripts of the RCC disease gene can hybridize to the corresponding 
qualifier under stringent or nucleic acid array hybridization conditions. As used herein, 
"hybridize to a qualifier" means to hybridize to at least one oligonucleotide probe listed 
under the qualifier in ATTACHMENT A. For instance, the RNA transcripts of the RCC 
disease gene can hybridize under stringent or nucleic acid array hybridization conditions to 
at least 2, 4, 6, 8, 10, 12, 14 or 16 oligonucleotide probes listed under the corresponding 
qualifier in ATTACHMENT A. The RNA transcripts of the RCC disease gene can also 
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hybridize under stringent or highly stringent conditions to the CPS of the corresponding 
qualifier. 

[0075] RCC disease genes represented by the qualifiers and CPSs of Table 2 can be 

determined based on the HG-U95Av2 gene chip annotation provided by Aflfymetrix. They 
can also be determined based on the Entrez accession numbers listed in Table 3, as 
appreciated by one of ordinary skill in the art. In addition, the identity of the RCC disease 
genes can be assessed by BLAST searching the corresponding CPSs or oligonucleotide 
probes, such as those listed in Table 2 or ATTACHMENT A, against a human genome 
sequence database. Suitable human genome sequence databases for this purpose include, 
but are not limited to, the Entrez human genome database maintained at the NCBL The 
Entrez human genome database contains about 97.8% of the total human genome sequence, 
and among them, about 63% are finished sequence and about 34.8% are unfinished 
sequence. The NCBI provides publicly accessible BLAST programs, such as "blastn," for 
BLAST searching its sequence database. 

[0076] Each CPS aligns with the protein-coding strand(s) of the corresponding RCC 

disease gene(s). Preferably, each CPS aligns to the corresponding RCC disease gene(s) 
with at least 97% sequence identity. Each CPS can hybridize to the corresponding RCC 
disease gene(s) under stringent or highly stringent conditions. Table 4 lists the CPSs and 
their corresponding RCC disease genes. All of the genes listed in Table 4 are collectively 
referred to as "Gene-Table 4." 



Table 4. RCC Disease Genes 



CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


1 


TLR2 


AF051 152 (SEQ ID NO: 1); and 
SEQ ID NO: 240 


2 


SLC1A4 


the complement of AA978353 (SEQ ID NO: 2) 


3 


LGALS3 


AB006780 (SEQ ID NO: 3) 


4 


DUSP6 


AB013382 (SEQ ID NO: 4); and 
SEQ ID NO: 241 


5 


KHSRP 


SEQ ID NO: 5; and 
the complement of AA628946 (SEQ ID NO: 
242) 


6 


T54 


U66359(SEQIDNO: 6) 
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CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


7 


RAB13 


X75593 (SEQ ID NO: 7) 


8 


DGCR5 


X91348 (SEQ ID NO: 8) 


9 


ENIGMA 


L35240 (SEQ ID NO: 9) 


10 


ETS2 


AFfll 79^7 KFO TD MO' 1 OV nnt\ 

J04102 (SEQ ID NO: 243) 


11 


PIP5K1C 


AB011161 (SEQ ID NO: 11) 


12 


TCFL1 


D43642 (SEQ ID NO: 12); and 
SEQ ID NO: 244 


13 


UNK_AF055000 


AF055000 (SEQ ID NO: 13) 


14 


IL1RAP 


AB006537 (SEQ ID NO: 14) 


15 


REL 


X75042 (SEQ ID NO: 15) 


16 


ITGA7 


AF032108 (SEQ ID NO: 16) 


17 


PPARD 


L07592 (SEQ ID NO: 17) 


18 


IL1RN 


X52015 (SEQ ID NO: 18) 


19 


LILRB3 


AF025533 (SEQ ID NO: 19) 


20 


FOX03A 


OXiV^ LU lHKJ. zu, ana 

AF032886 (SEQ ID NO: 245) 


21 


ANXA5 


U05770 (SEQ ID NO: 21) 


22 


SLC17A7 

(UJNlS. WZO/UUJ 


W26700 (SEQ ID NO: 22) 


23 


LOC51172 
(UNK AF052111 or 
APAA) 


AF0521 1 1 (SEQ ID NO: 23) 


24 


MPP1 


M64925 (SEQ ID NO: 24) 


25 


TPM1 


Ml 9267 (SEQ ID NO: 25) 


26 


UNK_M62896 


M62896 (SEQ ID NO: 26) 


27 


CSF2 


Ml 3207 (SEQ ID NO: 27) 


28 


LHFPL2 


D86961 (SEQ ID NO: 28) 
(3676-4193) 


29 


PARVB 
(UNK AA1 87563) 


the complement of AA1 87563 (SEQ ID NO: 29) 
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CPS No. 


Corresponding Gene 


Q<*nii^tir***c TTeofiil fill" ftAnlfincy P tv^V^^/Pri m pre tV\t" 

Detecting the Corresponding Gene 


30 


MUC1 


J05581 (SEQIDNO: 30) 


31 


MARCO 


AF035819(SEQIDNO: 31) 


. 32 


DRD2 


X51362 (SEQ ID NO: 32) 


33 


PPY 


the complement of AA844998 (SEQ ID NO: 33) 


34 


AQP9 


AB008775 (SEQ ID NO: 34) 


35 


APS 


AB000520 (SEQ ID NO: 35) 


36 


ALAS2 


X60364 (SEQ ID NO: 36) 


37 


CTSL 


XI 2451 (SEQIDNO: 37) 


38 


DKFZP586E1621 


AL080235 (SEQ ID NO: 38) 


39 


TfcTJ AOI Oft 

PR02389 
(UNK W28931) 


£>h,Q 1JJ JNU: jy, ana 
the complement of W28931 (SEQ ID NO: 246) 


40 


BLVRB 


D32143 (SEQIDNO: 40) 


41 


GNA13 


L22075 (SEQ ID NO: 41) 


42 


MAP2K3 


D87116(SEQIDNO: 42) 


43 


BASP1 


AA135683 (SEQ ID NO: 43) 


44 


BNIP3L 


AF079221 (SEQ ID NO: 44) 


45 


DBP 


U48213 (SEQ ID NO: 45) 


46 


HBACH 


uyuio (oHQ 11J INU. 4o); and 
SEQ ID NO: 247 


47 


DGAT 


AF059202 (SEQ ID NO: 47) 


48 


GUK1 


L76200 (SEQ ID NO: 48) 


49 


IL10RB 


L42243 (SEQ ID NO: 49) 


50 


PDNP2 


D45421 (SEQIDNO: 50) 


51 


SLC5A6 
(UNK AL096737) 


AL096737 (SEQ ID NO: 51) 


52 


GPR3 


L32831 (SEQIDNO: 52) 


53 


SOD2 


X07834 (SEQ ID NO: 53); and 
SEQ ID NO: 248 
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CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


54 


TREX1 


AJ243797 (SEQ ID NO: 54) 


55 


WNT6 
(UNK HI 2458) 


HI 2458 (SEQ ID NO: 55) 


56 


PIP5K2A 
(UNK S78798) 


S78798 (SEQ ID NO: 56) 


57 


FABP5 


M94856 (SEQ ID NO: 57); and 
SEQ ID NO: 249 


58 


MMP9 


J05070 (SEQ ID NO: 58) 


59 


ATP2B1 


J04027 (SEQ ID NO: 59); and 
SEQ ID NO: 250 


60 


NEUD4 


U43843 (SEQ ID NO: 60) 


61 


CCR1 


D10925 (SEQ ID NO: 61); and 
SEQ ID NO: 251 


62 


C8FW 


AJ000480 (SEQ ID NO: 62); and 
SEQ ID NO: 252 


63 


CLU 


M25915 (SEQ ID NO: 63); and 
SEQ ID NO: 253 


64 


EREG 


D30783 (SEQ ID NO: 64) 


65 


PPAP2B 


AF017786 (SEQ ID NO: 65) 
SEQ ID NO: 254 


66 


TUBB 


X79535 (SEQ ID NO: 66) 


67 


TvTT TOO 1 A 

NUr214 


D14ooy (oiiV^ m JNiJ. of) 


z: o 

68 


a t nu C A 1 


VoiiV,) ID JNU. Oo) 


69 


LOC64116 
(also referred to as 
UNK AL049963) 


AL049963 (SEQ ID NO: 69) 


70 


XK 


Z32684 (SEQ ID NO: 70) 


71 


KIAA0837 


AB020644 (SEQ ID NO: 71) 


72 


GYPC 


XI 2496 (SEQ ID NO: 72) 


73 


TFDP1 


L23959 (SEQ ID NO: 73); and 
W28479 (SEQ ID NO: 255) 


74 


C20orfl6 
(UNK U61836) 


U61836 (SEQ ID NO: 74) 


75 


FCAR 


U43774 (SEQ ID NO: 75) 


76 


ITGB3 


M35999 (SEQ ID NO: 76) 
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CPS No. 


Corresponding Gene 


Seauences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


77 


MXI1 


uj/o^o ^oiiv^ ilj invj. / / j, ana 
D63940 (SEQ ID NO: 256) 


78 


CSDA 


Mz4Uoy (.oHi^ ijj inu. /oj, ana 
SEQ ID NO: 257 


79 


FIP2 


AF061034 (SEQ ID NO: 79) 


80 


SELENBP1 


U29091 (SEQ ID NO: 80); and 
SEQ ID NO: 258 


81 


PPP1R2 


U68111 (SEQ ID NO: 81) 


82 


HPGD 


X82460 (SEQ ID NO: 82) 


83 


SLC4A1 


SEQ ID NO: 83; and 
M27819 (SEQ ID NO: 259) 


84 


IL17R 


U58917 (SEQ ID NO: 84) 


87 


CBFA2T3 


ABO 1041 9 (SEQ ID NO: 85) 


89 


RAP1GA1 
(KIAA0474) 


AB007943 (SEQ ID NO: 86) 


90 


BCL2L1 


Z231 15 (SEQ ID NO: 87); and 
SEQ ID NO: 260 


91 


COPEB 


AF001461 (SEQ ID NO: 88) 


92 


ADM 


D14874 (SEQ ID NO: 89); and 
SEQ ID NO: 261 


93 


SPTB 


J05500 (SEQ ID NO: 90) 


94 


ITGA2B 


M34480 (SEQ ID NO: 91) 


95 


CTNNAL1 
(UNK U97067) 


U97067 (SEQ ID NO: 92) 


96 


SCYA2 


M26683 (SEQ ID NO: 93); and 
M28225 (SEQ ID NO: 262) 


97 


NDUFB7 


the complement of AA527880 (SEQ ID NO: 94) 


98 


SCYA7 


X72308 (SEQ ID NO: 95) 


99 


FCGR1A 


M63835 (SEQ ID NO: 96); and 
SEQ ID NO: 263 


100 


EPB49 


U28389 (SEQ ID NO: 97) 


101 


DD96 


U2 1049 (SEQ ID NO: 98) 


102 


PPARG 


L40904 (SEQ ID NO: 99) 


103 


SPINK1 


the complement of AI961220 (SEQ ID NO: 
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CPS No. 


Corresponding Gene 


Cannae/tan T Tcofiil ^rw A A ol^i y\ ci PrAno/PritYiorc ■fir*'?* 

oequences useiui ior iviciKing r rooe/r nrnerb lor 
Detecting the Corresponding Gene 






100) 


104 


PLAUR 


X74039 (SEQ ID NO: 101) 


105 


CDC34 


L22005 (SEQ ID NO: 102) 


106 


UNK_AI732885 


. the complement ot Ai /jzooj ^oni^ iu in<j. 
103) 


107 


IL10RA 


U00672 (SEQ ID NO: 104) 


108 


FBX7 


AL050254(SEQIDNO: 105) 


109 


IFIT4 


AF026939 (SEQ ID NO: 106) 


110 


BAX 


U19599 (SEQ ID NO: 107) 


111 


BSG 


X64364 (SEQ ID NO: 108) 


112 


THBSl 
(UNK U12471) 


U12471 (SEQ ID NO: 109) 


113 


G2AD 


AF068706 (SEQ ID NO: 110) 


115 


RALBP1 


L42542 (SEQ ID NO: 1 1 1) 


116 


UNK AF070587 
(LOCI 96932) 


AF070587 (SEQ ID NO: 1 12) 


117 


DUX1 


AJ001481 (SEQ ID NO: 113) 


118 


SLC6A8 


U36341 (SEQ ID NO: 114) 


119 


THBD 


J02973 (SEQ ID NO: 115) 


120 


UNK_AF141349 


AF141349 (SEQ ID NO: 116) 


123 


HBEl 


the complement of AI349593 (SEQ ID NO: 
1 1 /), ana 
SEQ ID NO: 264 


125 


MAD 


L06895 (SEQ ID NO: 118) 


126 


TSPAN-5 


AF065389 (SEQ ID NO: 119) 


127 


BAGl 


Z35491 (SEQ ID NO: 120) 


128 


PDI2 


AB023211 (SEQ ID NO: 121) 


129 


ILIRI 


M27492 (SEQ ID NO: 122) 


130 


NP 


X00737 (SEQ ID NO: 123) 
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CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


131 


AQP3 
(UNK N74607) 


the complement of N74607 (SEQ ID NO: 124) 


132 


GSP1 1 


Al /o44 (afcA^ ID JNU. Izj) 


133 


GEF-2 


the complement of AI565760 (SEQ ID NO: 
126) 


134 


HBD 


SEO ID NO* 127- and 
V00505 (SEQ ID NO: 265) 


135 


HAGH 


X90999 (SEQ ID NO: 128) 


136 


ERN1 


AF059198 (SEQ ID NO: 129) 


137 


COL9A1 


X54412 (SEQ ID NO: 130) 


138 


S100A11 


D38583 (SEQ ID NO: 131) 


139 


FKBP1B 


D38037 (SEQ ID NO: 132) 


141 


RNAH 


qpn IT) "NO- 1 ^ 
AJ223948 (SEQ ID NO: 266) 


142 


MYRL2 


J02854 (SEQ ID NO: 134) 


143 


SPOP 


AJ000644 (SEQ ID NO: 135) 


144 


SLC11A1 
(UNK AI679353) 


the complement of AI679353 (SEQ ID NO: 
136)^ 


145 


SIAH2 


U76248 (SEQ ID NO: 137); and 
SEQ ID NO: 267 


146 


SI OOP 


AA131 149 (SEQ ID NO: 138) 


1 4.7 


TMNT1 

X INI 1 1 1 


AJ011712(SEQIDNO: 139); 
SEO TD NO- 268- and 

uljy LIS liV/. Z^l/Uj CU1U 

M19309 (SEQ ID NO: 269) 


148 


KIAA0750 


AB018293 (SEQ ID NO: 140) 


149 


FOS 


K00650 (SEQ ID NO: 1 4 1 ) 


150 


PAI2 


Y00630 (SEQ ID NO: 142) 


151 


PDXK 


U89606 (SEQ ID NO: 143) 


152 


UNK_AL049250 


AL049250 (SEQ ID NO: 144) 


153 


GR02 


M36820 (SEQ ID NO: 145) 


154 


INPP4A 


U96919 (SEQ ID NO: 146) 
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CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


155 


GPT 


U70732 (SEQ ID NO: 147) 


156 


MYL4 


SEQ ID NO: 148; and 
X58851 (SEQ ID NO: 270) 


157 


NFE2 


S77763 (SEQ ID NO: 149) 


158 


POLR2J 


L37127 (SEQ ID NO: 150) 


159 


CARM1 


AF055027 (SEQ ID NO: 151) 


160 


UNK_AF038171 


AF038171 (SEQ ID NO: 152) 


161 


RAB2 


SEQ ID NO: 153; and 
AF070629 (SEQ ID NO: 271) 


162 


6H9A 


L17330(SEQIDNO: 154) 


163 


EPB42 


M60298 (SEQ ID NO: 155); and 
SEQ ID NO: 272 


164 


CGTHBA 


X90857 (SEQ ID NO: 156) 


165 


DOC-1R 


AF089814 (SEQ ID NO: 157) 


166 


KIAA0353 


the complement of AI077476 (SEQ ID NO: 
158) 


167 


CSH1 


SEQ ID NO: 159 


168 


LOC51048 


AF034209 (SEQ ID NO: 160) 


169 


SELP 


M25322 (SEQ ID NO: 161) 


170 


RAP1GA1 


M64788 (SEQ ID NO: 162) 


171 


THBS1 


XI 4787 (SEQ ID NO: 163) 


172 


CHRNA4 


U62433 (SEQ ID NO: 164) 


173 


S100A12 


D83664 (SEQ ID NO: 165) 


174 


CD9 


M38690 (SEQ ID NO: 166) 


175 


TDPX1 


L19185 (SEQ ID NO: 167) 


176 


B7 


W27095 (SEQ ID NO: 168) 


177 


BPGM 


X04327 (SEQ ID NO: 169) 


178 


PSMA7 


AF054185 (SEQ ID NO: 170); and 
SEQ ID NO: 273 
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CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


179 


GMPR 


M24470 (SEQ ID NO: 171); and 
SEQ ID NO: 274 


180 


TMOD 


M77016(SEQIDNO: 172) 


181 


C4A 


SFO ID NO- 1 73- and 

kJJ-/y 1J-/ 1 > V-' • JL / *J j dJ.lv! 

U24578 (SEQ ID NO: 275), such as nucleotides 
16881 to 16928 and nucleotides 17131-17239 of 
SEQ ID NO: 275 


182 


GPR12 


U18548 (SEQ ID NO: 174) 


183 


ADFP 


Ay/3z4 (5sc,Q ijj in<j. i id), ana 
SEQ ID NO: 276 


184 


MYL5 


L03785 (SEQ ID NO: 176) 


185 


DPM2 


the complement of W80399 (SEQ ID NO: 177) 


186 


MCC 


M62397 (SEQ ID NO: 178) 


187 


F3 


J0293 1 (SEQ ID NO: 1 79) 


188 


KLF1 


U65404 (SEQ ID NO: 180) 


189 


HBG2 


oEy IU \ri\J. lolj dHU 

M91036 (SEQ ID NO: 277), such as nucleotides 
zioZ-zZoa, zJyl-zol4 or 33U1-Jdoj oi bfcl^ IJJ 
NO: 277 


190 


GR03 


M36821 (SEQ ID NO: 1 82) 


191 


PLEC1 


U53204 (SEQ ID NO: 1 83) 


192 


SLC16A3 


U8 1 800 (SEQ ID NO: 1 84) 


194 


FKBP8 


L37033 (SEQ ID NO: 185) 


195 


RNASE2 


X55988 (SEQ ID NO: 186) 


196 


BCAT1 


U21551 fSEO ID NO: 187V and 
SEQ ID NO: 278 


199 


SPP1 


juh /0-> ^oxjV^ 1NU. looj, dna 
AF052124 (SEQ ID NO: 279) 


201 


GROl 


X54489 (SEQ ID NO: 1 89) 


202 


DKFZP586O0223 


AL096741 (SEQ ID NO: 190) 


205 


FASN 


U29344 (SEQ ID NO: 192) 


206 


HOXA1 


U37431 (SEQ ID NO: 193) 
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CPS No. 


Corresponding Gene 


SpniiPnpp^ T T^pfiil frir PrnKp/PrimArc -fXr 

l_7 U-v/llv/V/O UjvIUX xKJM. iVJ.dXVHl& X X KJU\sf 1 1 1J.11CI o Xv/X 

Detecting the Corresponding Gene 


207 


HMOX1 


Z82244 (SEQ ID NO: 194) 


208 


BNIP3 


AF002697 (SEQ ID NO: 195) 


209 


ZNF261 


X95808 (SEQ ID NO: 196) 


210 


MYH7 


M58018 (SEQ ID NO: 197) 


211 


IL1B 


M15330 fSEO ID NO* 198)- and SEO ID NO- 
191 


212 


STX1A 


L37792 (SEQ ID NO: 199) 


213 


ATPASEP 


A TfinAOAQ /QThO IT. "MTV OH^- 

SEQ ID NO: 280 


214 


CR1 


X14362 (SEQ ID NO: 201) 


215 


DKFZP586M1523 


AL050225 (SEQ ID NO: 202) 


216 


KRT1 


M98776 (SEQ ID NO: 203) 


217 


(EXTl) 


AF070571 (SEQ ID NO: 204) 


218 


PPP3CB 


M29551 (SEQ ID NO: 205) 


219 


QSCN6 


L42379 (SEQ ID NO: 206) 


220 


PRF1 


M28393 (SEQ ID NO: 281) 


221 


FCGR3B 


XI 6863 (SEQ ID NO: 208) 


222 


PTGS2 


U04636 (SEQ ID NO: 209) 


223 


OPHN1 


AJ001 189 (SEQ ID NO: 210) 


224 


VSNL1 


AF039555 (SEQ ID NO: 211) 


225 


FECH 


oxjV^ LLJ l\KJ. ziz, ana 

D00726 (SEQ ID NO: 282) 


226 


KIAA0483 


AB007952 (SEQ ID NO: 213) 


227 


HK3 


U51333 (SEQ ID NO: 214) 


228 


MS4A3 


L35848 (SEQ ID NO: 215) 


229 


SCYA20 


U64197 (SEQ ID NO: 216) 


230 


C1QR1 


U94333 (SEQ ID NO: 217) 
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CPS No. 


Corresponding Gene 


Sequences Useful for Making Probe/Primers for 
Detecting the Corresponding Gene 


231 


POU1F1 


D1 021 6 fSFO ID NO- 21 RV and 

xJ X \J£* X \J I JLiy XX.S 1 ^ W . £* X O J j CUIU 

D12892 (SEQ ID NO: 2831 1 


232 


TKTL1 


X91817 (SEQ ID NO: 219) 


234 


CCNT2 


AF048732 (SEQ ID NO: 220) 


235 


ATP6V1H 

1 V X X \J V XXX 

(UNK W27838) 


W27838 (SEQ ID NO: 221) 


236 


FN1 


X02761 (SEQ ID NO: 222) 


237 


UNK. JU41 /o 
(HEXA) 


J04178 (SEQ ID NO: 223) 


239 


NR2C1 


M21985 (SEQ ID NO: 224) 


240 


KIAA0168 


W28731 (SEQ ID NO: 225) 


241 


IL6 


X04430 (SEQ ID NO: 226) 


242 


KIAA0372 


AB002370 (SEQ ID NO: 227) 


243 


CYP4F2 


U02388 (SEQ ID NO: 228) 


244 


STIP1 


M86752 (SEQ ID NO: 229) 


245 


CBP2 


D83 174 (SEQ ID NO: 230) 


246 


UNK_M 14087 


M14087 (SEQ ID NO: 231) 


247 


NCF1 


M55067 (SEQ ID NO: 232) 


248 


CHN2 


blil^ ID rsKJ. Zjd, ana 
U07223 (SEQ ID NO: 284) 


249 


ABL1 


Ml 4752 (SEQ ID NO: 234) 


250 


FLOT1 


AF089750 (SEQ ID NO: 235) 


251 


KJiViJL 
(UNK AL096744) 


AL096744 (SEQ ID NO: 236) 


252 


MUC3 


M55406 (SEQ ID NO: 237) 


253 


SMARCA4 


U29175 (SEQ ID NO: 238) 


254 


LOC92684 
(UNK AF035314) 


AF0353 14 (SEQ ID NO: 239) 


255 


EEF1A2 


X70940 (SEQ ID NO: 285) 


256 


BRF2 


U07802 (SEQ ID NO: 286) 



W288761.2 



57 



CPS No. 


Corresponding Gene 


C c*rx 1 1 pkf* r* p» c T Tcf^'fiil "fot* \zTa Vin o ^vc\V\ APri m f^fc fnr 
ofc/UUCIlL-Co UoClUl LSJl lvldA.111^ liUUv/i lllilClo 1U1 

Detecting the Corresponding Gene 


257 


SNRPG 


the complement or AI803447 (SEQ ID NO: 
287) 


258 


NUMA1 


Zl 1 584 (SEQ ID NO: 288) 


259 | 


AKR1B1 


X15414(SEQIDNO: 289) 


260 


SMARCE1 


AtUjjzoz voiiv^ iu jnu. ^yuj, ana oiiv^ id rsvj. 

328 


261 


KIAA0669 


the complement of AI452442 (SEQ ID NO: 
291) 


262 


MSF 


AB023208 (SEQ ID NO: 292) 


263 


PTMA 


Ml 4630 (SEQ ID NO: 293) 


264 


KIAA0410 


AB007870 (SEQ ID NO: 294) 


265 


PSMD3 


D67025 (SEQ ID NO: 295) 


266 


C1QBP 


M69039 (SEQ ID NO: 296) 


267 


OSR1 


AB017642 (SEQ ID NO: 297) 


268 


CD44 


L05424 (SEQ ID NO: 298) 


269 


CRADD 


U84388 (SEQ ID NO: 299) 


270 


CCRL2 


AF014958 (SEQ ID NO: 300) 


271 


KIAA0707 


AB014607 (SEQ ID NO: 301) 


272 


KIAA1113 


AB029036 (SEQ ID NO: 302); and SEQ ID NO: 

316 


273 


UNK_AL050119 


AL0501 19 (SEQ ID NO: 303) 


274 


UNK_AF052115 


AF0521 15 (SEQ ID NO: 304) 


275 


MITF 


AB006909 (SEQ ID NO: 305) 


276 


STAT3 


SEQ ID NO: 306; and L29277 (SEQ ID NO: 
315) 


277 


TPD52L2 


AF004430 (SEQ ID NO: 307) 


278 


UNK_AI732885 


the complement of AI732885 (SEQ ID NO: 308) 


279 


MAP3K8 


D14497 (SEQ ID NO: 309) 


280 


NSP-CL 


AB020693 (SEQ ID NO: 310) 
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CPS No. 


Corresponding Gene 


Sftfinpnpp^ T T^pfiil fnr K4fllcincy PTYVhp/PrirriRrQ frvr 

Detecting the Corresponding Gene 


281 


NRG1 


L41827 (SEQIDNO: 311) 


282 


RAB31 


U59877 (SEQ ID NO: 312) 


283 


MEF2D 


LI 6794 (SEQIDNO: 313) 


284 


UNK_AF038187 


AF038187 (SEQ ID NO: 314) 


285 


CXCR4 


L06797 (SEQ ID NO: 317) 


286 


M9 


AB019392 (SEQ ID NO: 318) 


287 


FAU 


X65923 (SEQIDNO: 319) 


288 


RPS6 


X67309 fSEO ID NO* 320V and SEO ID NO- 
330 


289 


BAG5 


AB020680 (SEQ ID NO: 321) 


290 


UNK_AL022721 


tne complement 01 orLv^ iu jnu. jzz 
(AL022721); and SEQ ID NO: 329 


291 


DKZP586E082O 


AL050147 (SEQ ID NO: 323) 


292 


NONO 


U02493 (SEQ ID NO: 324) 


293 


UNK_AI743507 


fhp> rnmnl Pmpnt nf QPO TO TvJfV 

II 1C CUIIipiCIIlCIlL UI OJ3v</ 1JL-/ rN\_J. JZJ 

(AI743507); and SEQ ID NO: 331 


294 


MAPKAPK5 


AF032437 (SEQ ID NO: 326) 


295 


UNK_U79297 


U79297 (SEQ ID NO: 327) 



[0077] CPS 1 corresponds to TLR2 which encodes toll-like receptor 2. TLR2 has 

LocusID: 7097, and is located on chromosome 4 with reported cytogenetic location 4q32. 
The protein encoded by TLR2 gene is a member of the Toll-like receptor (TLR) family 
which is believed to play a fundamental role in pathogen recognition and activation of 
innate immunity. TLRs are highly conserved from Drosophila to humans and share 
structural and functional similarities. They recognize pathogen-associated molecular 
patterns (PAMPs) that are expressed on infectious agents, and mediiate the production of 
cytokines necessary for the development of effective immunity. The various TLRs exhibit 
different patterns of expression. TLR2 is reported to be expressed abundantly in peripheral 
blood leukocytes, and to mediate host response to Gram-positive bacteria and yeast via 
stimulation of NF-kappaB. TLR2 may also mediate the signal for apoptosis. 
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[0078] CPS 2 corresponds to SLC1A4 which encodes solute carrier family 1 

(glutamate/neutral amino acid transporter), member 4. SLC1A4 has LocusID: 6509, and is 
localized on chromosome 2 with reported cytogenetic location 2pl5-pl3. The gene product 
is a sodium-dependent neutral amino acid transporter, and has independent chloride channel 
activity. It may function to equilibrate pools of neutral amino acids. 

[0079] CPS 3 corresponds to LGALS3 which encodes lectin, galactoside-binding, 

soluble, 3 (galectin 3). LGALS3 has LocusID: 3958, and is localized on chromosome 14 
with reported cytogenetic location 14q21-q22. LGALS3 may be involved in cell growth 
regulation. 

[0080] CPS 4 corresponds to DUSP6 which encodes dual specificity phosphatase 6. 

DUSP6 has LocusID: 1848, and is localized on chromosome 12 with reported cytogenetic 
location 12q22-q23. 

[0081] The protein encoded by DUSP6 gene is a member of the dual specificity 

protein phosphatase subfamily. These phosphatases may inactivate their target kinases by 
dephosphorylating both the phosphoserine/threonine and phosphotyrosine residues. They 
may negatively regulate members of the mitogen-activated protein (MAP) kinase 
superfamily (MAPK/ERK, SAPK/JNK, p38), which are associated with cellular 
proliferation and differentiation. Different members of the family of dual specificity 
phosphatases show distinct substrate specificities for various MAP kinases, different tissue 
distribution and subcellular localization, and different modes of inducibility of their 
expression by extracellular stimuli. It is reported that DUSP6 gene product inactivates 
ERK2, is expressed in a variety of tissues with high levels of expression in heart and 
pancreas, and is localized in the cytoplasm. Dual specificity protein phosphatase 6 may 
selectively dephosphorylate and inactivate MAP kinase. 

[0082] CPS 5 corresponds to KHSRP which encodes KH-type splicing regulatory 

protein (FUSE binding protein 2). KHSRP has LocusID: 8570, and is localized on 
chromosome 19 with reported cytogenetic location 19pl3.3. It is reported that KHSRP 
gene product is a component of a multiprotein complex and may be involved in the splicing 
of the Nl exon of SRC. The genomic sequence (nucleotides 544983 to 544793 of 
chromosome 19) that aligns to CPS 5 is located 3' to the polypeptide-coding sequence of 
KHSRP. This genomic sequence is also located 3' to the polypeptide-coding sequence of 
LOC125980. LOC125980 encodes a protein similar to complement C3 precursor (human). 
It has reported cytogenetic location 1 9p 1 3 .3 . 
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[0083] Nucleotides 1-501 of SEQ ID NO: 241 (AA628946) have about 99% 

sequence identity to KHSRP. Consequently, SEQ ID NO: 241 can be used to design probes 
for detecting the expression profile of KHSRP. Nucleotides 1-286 of SEQ ID NO: 241 also 
show about 89-93% sequence identity to a genomic sequence near the polypeptide-coding 
sequence of putative gene LOC138679. LOC138679 encodes a protein similar to KH-type 
splicing regulatory protein (FUSE binding protein 2) and KH-type splicing regulatory 
protein (FUSE-binding protein 2). LOCI 3 8679 is located on chromosome 9 with reported 
cytogenetic location 9p2 1.1. 

[0084] CPS 6 corresponds to T54 which encodes T54 protein. T54 has LocusID: 

27238, and is localized on chromosome X with reported cytogenetic location Xpl 1 .23. T54 
protein has a region of low similarity to S. cerevisiae Spp2p. 

[0085] CPS 7 corresponds to RAB 13, member RAS oncogene family. RAB13 has 

LocusID: 5872, and is localized on chromosome 1 with reported cytogenetic location 
lq21.2. RAB 13 gene product is known as GTP-binding protein 13, and may be involved in 
vesicle transport. It is a member of the RAB family of small GTPases. Nucleotides 1 06- 
1212 of SEQ ID NO: 7 (X75593) also align to a genomic sequence localized on 
chromosome 12 with reported cytogenetic location 12ql3. 

[0086] CPS 8 corresponds to a genomic sequence (DGCR5) at Di George syndrome 

critical region 5 on chromosome 22. The corresponding genomic sequence is located 3' to 
the coding sequence of putative gene LOCI 28966 (similar to carbonic anhydrase 15). 
LOC128966 has LocusID: 9993, and is localized at cytogenetic location 22ql 1.1. 
[0087] CPS 8 also shows about 97% sequence identity to a genomic sequence near 

the putative gene LOC91208 on chromosome 22. LOC91208 has reported cytogenetic 
location 22q 11.21. 

[0088] Blast search of X9 1348 (SEQ ID NO: 8) shows a corresponding genomic 

sequence which is localized on chromosome 22. The genomic sequence includes putative 
gene LOC200301 (similar to KIAA1647 protein) and DiGeorge syndrome gene A (DGS- 
A). DGS-A has LocusID: 25787. Deletions of the region near 22qll.2 have been 
associated with a wide range of developmental defects (notably DiGeorge syndrome, 
velocardiofacial syndrome, conotruncal anomaly face syndrome and isolated conotruncal 
cardiac defects) classified under the acronym CATCH 22. 

[0089] In addition, fragments of nucleotides 132 to 699 of X9 1348 have 91% 

sequence identity to CELSR1 which encodes cadherin, EGF LAG seven-pass G-type 
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receptor 1 (flamingo homolog, Drosophila). CELSR1 has LocusID: 9620, and is also 
localized on chromosome 22. 

[0090] CPS 9 corresponds to ENIGMA which encodes enigma (LIM domain 

protein). ENIGMA has LocusID: 9260, and is localized on chromosome 5 with reported 
cytogenetic location 5q35.3. The protein encoded by this gene is representative of a family 
of proteins composed of conserved PDZ and LIM domains. LIM domains are proposed to 
function in protein-protein recognition in a variety of contexts including gene transcription 
and development and in cytoskeletal interaction. The LIM domains of ENIGMA gene 
product may bind to protein kinases, whereas the PDZ domain may bind to actin filaments. 
The gene product may be involved in the assembly of an actin filament-associated complex 
essential for transmission of ret/ptc2 mitogenic signaling. The biological function of 
ENIGMA gene product is proposed to be that of an adapter, with the PDZ domain 
localizing the LIM-binding proteins to actin filaments of both skeletal muscle and 
nonmuscle tissues. It is also reported that ENIGMA gene product can bind to the insulin 
receptor (INSR). 

[0091] CPS 9 also has about 99% sequence identity to LOC220783 which encodes a 

protein similar to enigma (LIM domain protein). LOC220783 is localized on chromosome 
5 with reported cytogenetic location 5q35.3. 

[0092] CPS 10 corresponds to ETS2 which encodes v-ets erythroblastosis virus E26 

oncogene homolog 2 (avian). ETS2 has LocusID: 2114, and is localized on chromosome 21 
with reported cytogenetic location 21q22.2. ETS2 gene product is believed to be a 
transcription factor, and may have a role in some skeletal abnormalities in Downs 
syndrome. 

[0093] CPS 11 corresponds to PIP5K1C which encodes phosphatidylinositol-4- 

phosphate 5 -kinase, type I, gamma. PIP5K1C has LocusID: 23396, and is localized on 
chromosome 19 with reported cytogenetic location 19pl3.3. 

[0094] CPS 12 corresponds to TCFL1 which encodes transcription factor-like 1. 

The gene has LocusID: 6944, and is localized on chromosome 1 with reported cytogenetic 
location lq21. The coding sequence of putative gene LOCI 48320 is located within TCFL1. 
LOC148320 also aligns with CPS 12. 

[0095] CPS 13 can be derived from Homo sapiens mRNA for unknown liver 

orphan. The hypothetical gene(s) which corresponds to CPS 13 and produces the RNA 
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transcripts capable of hybridizing under stringent conditions to CPS 13 is herein referred to 
as UNK-AF055000. 

[0096] CPS 14 corresponds to IL1RAP which encodes interleukin 1 receptor 

accessory protein. The gene has LocusID: 3556, and is localized on chromosome 3 with 
reported cytogenetic location 3q28. The gene product is a co-receptor for IL-1RI (IL1R1). 
[0097] CPS 15 corresponds to REL which encodes v-rel reticuloendotheliosis viral 

oncogene homolog (avian). The gene has LocusID: 5966, and is localized on chromosome 
2 at reported cytogenetic location 2pl3-pl2. The gene product is considered to be a 
transcription factor. 

[0098] CPS 16 corresponds to ITGA7 which encodes integrin, alpha 7. The gene 

has LocusID: 3679, and is localized on chromosome 12 with reported cytogenetic location 
12ql3. 

[0099] ITGA7 encodes integrin alpha chain 7. Integrins are heterodimeric integral 

membrane proteins composed of an alpha chain and a beta chain. Alpha chain 7 undergoes 
post-translational cleavage within the extracellular domain to yield disulfide-linked light 
and heavy chains that join with beta 1 to form an integrin that binds to the extracellular 
matrix protein laminin-1 . Alpha 7 beta 1 is a major integrin complex expressed in 
differentiated muscle cells. Splice variants of alpha 7 that differ in both the extracellular 
and cytoplasmic domains exist in the mouse. However, to date only a single human 
transcript type has been isolated. It contains extracellular and cytoplasmic domains 
corresponding to the mouse X2 and B variants, respectively. A unique extracellular splice 
variant has been identified in human, although it may represent a minor species and its 
biological significance is unclear. Alpha 7 subunit of integrin is a laminin receptor. 
[0100] Affymetrix annotation suggests that CPS 17 corresponds to PPARD. Blast 

search against the Entrez human genome database shows that CPS 17 also aligns to 
LOC221486 with over 98% sequence identity. LOC221486 encodes a protein similar to 
peroxisome proliferator activated receptor beta (PPAR-beta) (PPAR-delta) (Nuclear 
hormone receptor 1) (NUC1) (NUCI). The gene is localized on chromosome 6 with 
reported cytogenetic location 6p2 1.1. 

[0101] CPS 18 corresponds to IL1RN which encodes interleukin 1 receptor 

antagonist. The gene has LocusID: 3557, and is localized on chromosome 2 with reported 
cytogenetic location 2ql4.2. The gene product can bind to and inhibit the IL-1 receptor. 
The gene product is a member of the interleukin- 1 (IL-1) family. 
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[0102] CPS 19 corresponds to LILRB3 which encodes leukocyte immunoglobulin- 

like receptor, subfamily B (with TM and ITIM domains), member 3. The gene has 
LocusID: 11025, and is localized at chromosome 19 with reported cytogenetic location 
19ql3.4. The gene product may play a role in regulation of immune responses. It is a 
member of the immunoglobulin superfamily. 

[0103] CPS 19 also shows about 99% sequence identity to LOC 163021. 

LOC163021 encodes a protein similar to immunoglobulin-like transcript 5. The gene is 
localized on chromosome 19 with reported cytogenetic location 19ql3.42. 
[0104] CPS 20 corresponds to FOX03A which encodes forkhead box 03A. The 

gene has LocusID: 2309, and is localized at chromosome 6 with reported cytogenetic 
location 6q21. The gene product belongs to the forkhead family of transcription factors 
which are characterized by a distinct forkhead domain. This gene may function as a trigger 
for apoptosis through expression of genes necessary for cell death. Translocation of this 
gene with the MLL gene may be associated with secondary acute leukemia. 
[0105] Nucleotides 1-3183 of SEQ ID NO: 245 (AF032886) share at least 99% 

sequence identity to FOX03A. Consequently, SEQ ID NO: 245 can be used to design 
probes for detecting the expression of FOX03A. Nucleotides 672 to 3182 of SEQ ID NO: 
245 also have 98% sequence identity to LOC 147 167. LOC 147 167 is similar to 
bA653O20.1 (forkhead box 03A (forkhead Drosophila homolog like 1, FKHRL1)). 
LOC 147 167 is localized on chromosome 17 with reported cytogenetic location 17pl 1.1. 
[0106] CPS 21 corresponds to ANXA5 which encodes annexin A5. The gene has 

- LocusID: 308, and is localized on chromosome 4 with reported cytogenetic location 4q28- 
q32. The gene product belongs to the annexin family of calcium-dependent phospholipid 
binding proteins, some of which have been implicated in membrane-related events along 
exocytotic and endocytotic pathways. The gene product is a phospholipase A2 and protein 
kinase C inhibitory protein with calcium channel activity and a potential role in cellular 
signal transduction, inflammation, growth and differentiation. The gene product has also 
been described as placental anticoagulant protein I, vascular anticoagulant-alpha, endonexin 
II, lipocortin V, placental protein 4 and anchorin CII. The gene contains at least 13 exons, 
and encodes at least one transcript of approximately 1.6 kb and at least one protein product 
with a molecular weight of about 35 kDa. 

[0107] CPS 22 corresponds to SLC17A7 which encodes solute carrier family 17 

(sodium-dependent inorganic phosphate cotransporter), member 7. The gene has LocusID: 
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57030, and is localized on chromosome 19 with reported cytogenetic location 19ql3. The 
protein encoded by this gene is highly similar to brain specific sodium-dependent inorganic 
phosphate cotransporter [R.norvegicus]. The protein is a vesicle-bound, sodium-dependent 
phosphate transporter. It may be associated with the membranes of synaptic vesicles and 
function in glutamate transport. The protein shares 82% identity with the differentiation- 
associated Na-dependent inorganic phosphate cotransporter. 

[0108] CPS 23 corresponds to LOC51172 (APAA) which encodes N- 

acetylglucosamine-l-phosphodiester alpha-N-acetylglucosaminidase. The gene has 
LocusID: 51172, and is localized on chromosome 16 with reported cytogenetic location 
1 6p 1 3 . 1 3 . N-acetyl glucosamine- 1 -phosphodi ester alpha-N-acetylglucosaminidase 

(phosphodiester alpha-GlcNAcase) catalyzes the second step in the synthesis of mannose 6- 
phosphate, and may be involved in forming the mannose 6-phosphate recognition signal on 
lysosomal enzymes. 

[0109] CPS 24 corresponds to MPP1 which encodes membrane protein, 

palmitoylated 1 (55kD). The gene has LocusID: 4354, and is localized on chromosome X 
with reported cytogenetic location Xq28. Palmitoylated membrane protein 1 is the 
prototype of a family of membrane-associated proteins termed MAGUKs (membrane- 
associated guanylate kinase homologs). MAGUKs interact with the cytoskeleton and 
regulate cell proliferation, signaling pathways, and intracellular junctions. Palmitoylated 
membrane protein 1 contains a conserved sequence, called the SH3 (src homology 3) motif, 
which is found in several other proteins that associate with the cytoskeleton and is suspected 
to play important roles in signal transduction. Palmitoylated membrane protein 1 is similar 
to Drosophila dig (a tumor suppressor) and guanylate kinases. 

[0110] CPS 25 corresponds to TPM1 which encodes tropomyosin 1 (alpha). The 

gene has LocusID: 7168, and is localized on chromosome 15 with reported cytogenetic 
location 15q22.1. Alpha-tropomypsin 1 binds to actin and troponin, and is a member of a 
family of actin-binding and troponin-binding proteins. 

[0111] CPS 26 corresponds to UNKJVI62896 which shows about 99% sequence 

identity with the non-protein coding strand of TRIM2 gene. TRIM2 encodes tripartite 
motif-containing 2, and has LocusID: 23321 with reported cytogenetic location 4q3 1.23. 
[0112] CPS 26 shows about 86-90% sequence similarity to LOC221025 and 

ANXA2P2. LOC221025 is a hypothetical gene supported by M62895. LOC221025 is 
localized on chromosome 10. ANXA2P2 is localized on chromosome 9, and encodes 
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annexin A2 pseudogene 2. In addition, CPS 26 has 91-93% sequence identity with two 
exons of ANXA2. ANXA2 encodes annexin A2, and has LocusID: 302 with reported 
cytogenetic location 15q21-q22. 

[0113] CPS 27 corresponds to CSF2 which encodes colony stimulating factor 2 

(granulocyte-macrophage). The gene has LocusID: 1437, and is localized on chromosome 
5 with reported cytogenetic location 5q31.1. Granulocyte-macrophage colony stimulating 
factor 2 regulates hematopoietic cell differentiation, gene expression, and growth. 
[0114] CPS 28 corresponds to LHFPL2 which encodes lipoma HMGIC fusion 

partner-like 2. The gene has LocusID: 10184, and is localized on chromosome 5 with 
reported cytogenetic location 5ql3.3. Part of CPS 28 has about 90% sequence identity to 
LOC220397. LOC220397 encodes high mobility group protein 4 (HMG-4) (High mobility 
group protein 2a) (HMG-2a), and is localized on chromosome 1 1 with reported cytogenetic 
location 11 ql 4.2. 

[0115] CPS 29 corresponds to PARVB which encodes parvin, beta. The gene has 

LocusID: 29780, and is localized on chromosome 22 with reported cytogenetic location 
22ql3.2-ql3.33. The gene product is also known as CGI-56 protein. 

[0116] CPS 30 corresponds to MUC1 which encodes mucin 1, transmembrane. The 

gene has LocusID: 4582, and is localized on chromosome 1 with reported cytogenetic 
location lq21. MUC1 gene product is a cell surface transmembrane glycoprotein. 
Alterations in glycosylation have been observed in epithelial cancer cells. MUC1 gene 
contains at least seven exons, and several alternatively spliced variants have been reported. 
10117] CPS 30 also has at least 99% sequence identity to LOC245755, which is a 

hypothetical gene supported by NM_002456 and X52228. LOC245755 is localized within 
MUC1. 

[0118] CPS 31 corresponds to MARCO which encodes macrophage receptor with 

collagenous structure. The gene has LocusID: 8685, and is localized on chromosome 2 with 
reported cytogenetic location 2ql2-ql3. The gene protein has a collagenous structure that 
contains a bacteria-binding region. 

[0119] CPS 32 corresponds to DRD2 which encodes dopamine receptor D2. The 

gene has LocusID: 1813, and is localized on chromosome 11 with reported cytogenetic 
location llq23. This gene encodes the D2 subtype of the dopamine receptor. This G- 
protein coupled receptor can increase potassium channel activity, and inhibit adenylyl 
cyclase, calcium flux and phospholipid turnover. A missense mutation in this gene causes 
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myoclonus dystonia. Other mutations have been associated with schizophrenia. Alternative 
splicing of this gene results in two transcript variants encoding different isoforms. A thrd 
variant has been described, but it has not been determined whether this form is normal or 
due to aberrant splicing. 

[0120] CPS 33 corresponds to PPY which encodes pancreatic polypeptide. The 

gene has LocusID: 5539, and is localized on chromosome 17 with reported cytogenetic 
location 17q21. The gene product is a precursor of the pancreatic polypeptide and 
pancreatic icosapeptide. Mature pancreatic peptide can inhibit pancreatic exocrine function. 
[0121] CPS 34 corresponds to AQP9 which encodes aquaporin 9. The gene has 

LocusID: 366, and is localized on chromosome 15 with reported cytogenetic location 
15q22. 1-22.2. The aquaporins/major intrinsic protein are a family of water-selective 
membrane channels. Aquaporin 9 has greater sequence similarity with AQP3 and AQP7, 
and they may be a subfamily. Aquaporin 9 allows passage of a wide variety of noncharged 
solutes. Aquaporin 9 stimulates urea transport and osmotic water permeability. There are 
contradicting reports about its role in providing glycerol permeability. Aquaporin 9 may 
also have some role in specialized leukocyte functions such as immunological response and 
bactericidal activity. Aquaporin 9 is expressed in leukocytes 

[0122] CPS 35 corresponds to APS which encodes adaptor protein with pleckstrin 

homology and src homology 2 domains. The gene has LocusID: 10603, and is localized on 
chromosome 7 with reported cytogenetic location 7q22. The APS protein, expressed in B 
lymphocytes, contains pleckstrin homology and src homology 2 (SH2) domains. In Burkitt 
lymphoma cell lines, it is tyrosine phosphorylated in response to B cell receptor stimulation. 
Because it binds She independent of stimulation and Grb2 after stimulation, it appears to 
play a role in signal transduction from the receptor to Shc/Grb2. It may link activated 
tyrosine kinases to signaling pathways. 

[0123] CPS 36 corresponds to ALAS2 which encodes aminolevulinate, deltar, 

synthase 2 (sideroblastic/hypochromic anemia). The gene has LocusID: 212, and is 
localized on chromosome X with reported cytogenetic location Xpl 1 .21 . The ALAS2 gene 
product catalyzes the first step in the heme biosynthetic pathway. A second delta- 
aminolevulinate synthase gene (ALAS1) is located on chromosome 3 and is expressed in 
various tissues. A defective ALAS2 gene may cause X-linked pyridoxine-responsive 
sideroblastic anemia (Hypochromic Anemia). The gene product is also known as erythroid- 
specific delta-aminolevulinate synthase. 
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[0124] CPS 36 has about 99% sequence identity to LOC203568. LOC203568 

encodes a protein similar to 5-aminolevulinic acid synthase, erythroid-specific, 
mitochondrial precursor (Delta-aminolevulinate synthase) (DeltarALA synthetase) (ALAS- 
E). The gene is located on chromosome X with reported cytogenetic location Xpl 1 .22. 
[0125] CPS 37 corresponds to CTSL which encodes cathepsin L. The gene has 

LocusID: 1514, and is located on chromosome 9 with reported cytogenetic location 9q21- 
q22. The gene product is a lysosomal cysteine (thiol) protease that can cleave collagen and 
elastin. 

[0126] CPS 37 has about 80-90% sequence identity to certain other genes. These 

genes include LOCI 18945, LOCI 19215 and LOC219343. LOCI 18945 is similar to 
Cathepsin L precursor (Major excreted protein) (MEP). It is located on chromosome 10 
with reported cytogenetic location 10q23.32. LOCI 19215 is also similar to Cathepsin L 
precursor (Major excreted protein) (MEP). It has reported cytogenetic location 10q21.1. 
LOC2 1 9343 has reported cytogenetic location 1 0q23 .2. 

[0127] CPS 38 corresponds to DKFZP586E1621 which encodes Ras-induced 

senescence 1. The gene has LocusID: 25907, and is located on chromosome 3 with reported 
cytogenetic location 3p21.3. The gene is also known as RIS1. 

[0128] CPS 39 corresponds to PR02389 which encodes a hypothetical protein. The 

gene has LocusID: 80344, and is localized on chromosome 14 with reported cytogenetic 
location 14qll.2. The gene product is weakly similar to a 38kDa splicing factor 
[H. sapiens]. 

[0129] CPS 40 corresponds to BLVRB which encodes biliverdin reductase B (flavin 

reductase (NADPH)). The gene has LocusID: 645, and is located on chromosome 19 with 
reported cytogenetic location 19ql3.1-ql3.2. 

[0130] CPS 41 corresponds to GNA13 which encodes guanine nucleotide binding 

protein (G protein), alpha 13. The gene has LocusID: 10672, and is located on chromosome 
17 with reported cytogenetic location 17q22-q24. The gene product is a component of 
heterotrimeric G-protein complexes. 

[0131] CPS 41 shows about 75-80% sequence similarity to a genomic sequence near 

LOC130117. LOC130117 is similar to zinc finger protein 10 (KOX 1), and located on 
chromosome 2 with reported cytogenetic location 2pl 1 .2. 

[0132] CPS 42 corresponds to MAP2K3 which encodes mitogen-activated protein 

kinase kinase 3. The gene has LocusID: 5606, and is located on chromosome 17 with 
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reported cytogenetic location 17ql 1.2. The protein encoded by this gene is a dual 
specificity protein kinase that belongs to the MAP kinase kinase family. This kinase can be 
activated by mitogenic and environmental stress, and may participate in the MAP kinase- 
mediated signaling cascade. It can phosphorylate and thus activate MAPK14/p38-MAPK. 
This kinase can also be activated by insulin, and may be necessary for the expression of 
glucose transporter. Expression of RAS oncogene is found to result in the accumulation of 
the active form of this kinase, which thus leads to the constitutive activation of MAPK14, 
and confers oncogenic transformation of primary cells. The inhibition of this kinase is 
involved in the pathogenesis of Yersina pseudotuberculosis. Three alternatively spliced 
transcript variants of this gene encoding distinct isoforms have been reported. 
[0133] CPS 42 has about 96-98% sequence identity to LOC146732. LOC146732 is 

similar to MAP kinase kinase 3b, and has reported cytogenetic location 17pl3.1. 
[0134] CPS 43 corresponds to BASP1 which encodes brain abundant, membrane 

attached signal protein 1. The gene has LocusID: 10409, and is located on chromosome 5 
with reported cytogenetic location 5pl5.1-pl4. Nucleotides 433 to 554 of AA135683 also 
has 91% sequence identity to putative gene LOC222467 which is located on chromosome 
1 3 with reported cytogenetic location 1 3ql 2. 1 1 . 

[0135] CPS 44 corresponds to BNIP3L which encodes BCL2/adeno virus E1B 19kD 

interacting protein 3-like. The gene has LocusID: 665, and is located on chromosome 8 
with reported cytogenetic location 8p21. This gene is a member of the BCL2/adeno virus 
E1B 19 kd-interacting protein (BNIP) family. BNIP3L gene product can interact with the 
E1B 19 kDa protein which is responsible for the protection of virally-induced cell death. 
The gene product is a functional homolog of BNIP3, a proapoptotic protein. The gene 
product may function simultaneously with BNIP3 and play a role in tumor suppression. 
The gene product can also bind cellular Bcl2 or Bcl2Ll, and may promote apoptosis. 
[0136] CPS 45 corresponds to DBP which encodes D site of albumin promoter 

(albumin D-box) binding protein. The gene has LocusID: 1628, and is located on 
chromosome 19 with reported cytogenetic location 19ql3.3. The gene product may 
function as a transcription factor and play a role in the diurnal regulation of liver-specific 
genes. It is a member of the PAR (proline and acidic amino acid-rich) b/ZIP family. 
[0137] CPS 46 corresponds to BACH (hBACH) which encodes brain acyl-CoA 

hydrolase. The gene has LocusID: 11332, and is located on chromosome 1 with reported 
cytogenetic location Ip36.31-p36.11. The gene product is a member of the acyl coenzyme 
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family. It can hydrolyze the CoA thioester of palmitoyl-CoA and other long-chain fatty 
acids. The gene product is also known as cytosolic acyl coenzyme A thioester hydrolase. 
[0138] Nucleotides 76-1101 of SEQ ID NO: 46 (U91316) have about 89% sequence 

identity to LOCI 32927 which encodes a protein similar to cytosolic acyl coenzyme A 
thioester hydrolase (Long chain acyl-CoA thioester hydrolase) (CTE-II) (Brain acyl-CoA 
hydrolase) (BACH). LOCI 32927 is located on chromosome 4 with reported cytogenetic 
location 4pl4. 

[0139] CPS 47 corresponds to DGAT1 which encodes diacylglycerol O 

acyltransferase homolog 1 (mouse). The gene has LocusID: 8694, and is located on 
chromosome 8 with reported cytogenetic location 8qter. The enzyme encoded by this gene 
utilizes diacylglycerol and fatty acyl CoA as substrates in order to catalyze the final stage of 
triacylglycerol synthesis. It is also involved in cellular as well as physiological metabolic 
processes. 

[0140] CPS 48 corresponds to GUK1 which encodes guanylate kinase 1. The gene 

has LocusID: 2987, and is located on chromosome 1 with reported cytogenetic location 
Iq32-q41. The gene product can convert GMP to GTP as part of the cGMP cycle. 
[0141] CPS 49 corresponds to IL10RB which encodes interleukin 10 receptor, beta. 

The gene has LocusID: 3588, and is located on chromosome 21 with reported cytogenetic 
location 21q22.1 1. Interleukin 10 receptor beta subunit transduces a signal upon binding of 
interleukin- 10 (IL10). It is a class II member of the cytokine receptor family (CRF2). 
[0142] The chromosomal region that aligns to CPS 49 is also located 3* to the 

polypeptide-coding sequence of IFNAR2. IFNAR2 encodes interferon (alpha, beta and 
omega) receptor 2. The gene has LocusID: 3455, and is located on chromosome 21 with 
reported cytogenetic location 21q22.1 1 . 

[0143] CPS 50 corresponds to ENPP2 (PDNP2) which encodes ectonucleotide 

pyrophosphatase/phosphodiesterase 2 (autotaxin). The gene has LocusID: 5168, and is 
located on chromosome 8 with reported cytogenetic location 8q24.1. Autotaxin is a potent 
tumor cell motility-stimulating protein. The gene product is also known as 
phosphodiesterase I/nucleotide pyrophosphatase 2 (autotaxin). 

[0144] Nucleotides 375-452, 1241-1277, 1576-1761 and 1399-1488 of SEQ ID NO: 

50 (D45421) also have 97-100% sequence identity to a genomic sequence near LOC206890 
on chromosome 8. LOC206890 is similar to cytochrome c (somatic) and has reported 
cytogenetic location 8ql2.3. 
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[0145] CPS 51 corresponds to SLC5A6 which encodes solute carrier family 5 

(sodium-dependent vitamin transporter), member 6. The gene has LocusID: 8884, and is 
located on chromosome 2 with reported cytogenetic location 2p23. The gene product 
functions in the transplacental transfer of pantothenate biotin and lipoate. Nucleotides 962 
to 1314 of SEQ ID NO: 51 (AL096737) has about 90% identity to TCF23 (LocusID: 
150921) which encodes transcription factor 23 and is located on chromosome 2 with 
reported cytogenetic location 2p23 .3 . 

[0146] CPS 52 corresponds to GPR3 which encodes G protein-coupled receptor 3. 

The gene has LocusID: 2827, and is located on chromosome 1 with reported cytogenetic 
location Ip36.1-p35. The gene product can activate adenylate cyclase in cell lines, and is a 
member of the G protein-coupled receptor family. 

[0147] CPS 53 corresponds to SOD2 which encodes superoxide dismutase 2, 

mitochondrial. The gene has LocusID: 6648, and is located on chromosome 6 with reported 
cytogenetic location 6q25.3. The gene product is an intramitochondrial free radical 
scavenging enzyme, and has strong similarity to murine Sod2. 

[0148] CPS 54 corresponds to TREX1 which encodes three prime repair 

exonuclease 1. The gene has LocusID: 11277, and is located on chromosome 3 with 
reported cytogenetic location 3p21.3-p21.2. This gene uses at least two different open 
reading frames. The upstream ORF encodes proteins which interact with the ataxia 
telangiectasia and Rad3 related protein, a checkpoint kinase. The proteins encoded by this 
upstream ORF localize to intranuclear foci following DNA damage and may be importatnt 
components of the DNA damage checkpoint. The downstream ORF encodes proteins with 
3 1 exonuclease activity. Other enzymes with this activity are involved in DNA replication, 
repair, and recombination. Similarity to an E. coli protein suggests that the enzymes 
encoded by this ORF may be a subunit of DNA polymerase III, which does not have 
intrinsic exonuclease activity. Both ORFs are subject to alternative splicing, resulting in at 
least six transcript variants. 

[0149] CPS 54 also has about 99% sequence identity to at least parts of LOC200884 

and LOCI 52456. Both genes are located within TREX1. LOC200884 encodes protein(s) 
similar to three prime repair exonuclease 1 (isoform b), 3 repair exonuclease 1, 
deoxyribonuclease III (dnaQ/mutD (E. coli)-like), and ATR interacting protein. 
LOC200884 has reported cytogenetic location 3p21.31. LOG 152456 encodes protein(s) 
similar to three prime repair exonuclease 1 (isoform b), 3 repair exonuclease 1, 
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deoxyribonuclease III (dnaQ/mutD (E. coli)-like), and ATR interacting protein. It has 
reported cytogenetic location 3p21 .3 1 . 

[0150] CPS 55 corresponds to WNT6 which encodes wingless-type MMTV 

integration site family, member 6. The gene has LocusID: 7475, and is located on 
chromosome 2 with reported cytogenetic location 2q35. The WNT gene family consists of 
structurally related genes which encode secreted signaling proteins. These proteins have 
been implicated in oncogenesis and in several developmental processes, including 
regulation of cell fate and patterning during embryogenesis. This gene is a member of the 
WNT gene family. It is overexpressed in a cervical cancer cell line and strongly 
coexpressed with another family member, WNT 1 OA, in a colorectal cancer cell line. The 
gene overexpression may play key roles in carcinogenesis. This gene and the WNT10A 
i gene are clustered in the chromosome 2q35 region. The protein encoded by this gene is 

97% identical to the mouse Wnt6 protein at the amino acid level. 

[0151] CPS 56 corresponds to PIP5K2A which encodes phosphatidylinositol-4- 

phosphate 5-kinase, type II, alpha. The gene has LocusID: 5305, and is located on 
chromosome 10 with reported cytogenetic location 10pll.23. Phosphatidylinositol-4,5- 
bisphosphate, the precursor to second messengers of the phosphoinositide signal 
transduction pathways, is thought to be involved in the regulation of secretion, cell 
proliferation, differentiation, and motility. The protein encoded by this gene is one of a 
family of enzymes capable of catalyzing the phosphorylation of phosphatidylinositol-4- 
phosphate on the fifth hydroxyl of the myo-inositol ring to form phosphatidylinositol-4,5- 
bisphosphate. The gene product exhibits kinase activity. This gene is a member of the 
phosphatidylinositol-4-phosphate 5-kinase family. The gene product is also known as 1- 
phosphatidylinositol-4-phosphate-5-kinase isoform C. 

[0152] CPS 57 corresponds to FABP5 which encodes fatty acid binding protein 5 

(psoriasis-associated). FABP5 gene has LocusID: 2171, and is located on chromosome 8 
with reported cytogenetic location 8q2 1.13. The gene encodes the fatty acid binding protein 
found in epidermal cells, and was identified as being upregulated in psoriasis tissue. Fatty 
acid binding proteins are a family of small, highly conserved, cytoplasmic proteins that bind 
long-chain fatty acids and other hydrophobic ligands. It is thought that fatty acid binding 
proteins are involved in fatty acid uptake, transport, and metabolism. FABP5 gene product 
binds to stearic acid and may have a role in keratinocyte differentiation. 
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[0153] CPS 57 also shows 100% sequence alignment with an intron sequence of 

STX3A which encodes syntaxin 3A. The gene has LocusID: 6809, and is located on 
chromosome 11 with reported cytogenetic location llql2.3. Syntaxin 3A is involved in 
intracellular protein transport. 

[0154] In addition, CPS 57 has about 95-97% sequence identity to LOC95551, 

LOC220113, LOCI 14948, LOC220832, and LOC150161. LOC95551 is similar to fatty 
acid-binding protein, epidermal (E-FABP) (psoriasis-associated fatty acid-binding protein 
homolog) (PA-FABP). LOC95551 is located on chromosome 13 with reported cytogenetic 
location 13q21.33. LOC220113 encodes fatty acid-binding protein, epidermal (E-FABP) 
(psoriasis-associated fatty acid-binding protein homolog) (PA-FABP). LOC220113 has 
reported cytogenetic location 13ql4.13. LOC220113 is within an intron of ATP7B which 
encodes ATPase, Cu++ transporting, beta polypeptide (Wilson disease), and has LocusID: 
540. 

[0155] LOCI 14948 encodes a protein similar to fatty acid-binding protein, 

epidermal (E-FABP) (psoriasis-associated fatty acid-binding protein homolog) (PA-FABP). 
It is located on chromosome 15 with reported cytogenetic location 15q25.3. LOC220832 
also encodes a protein similar to fatty acid-binding protein, epidermal (E-FABP) (psoriasis- 
associated fatty acid-binding protein homolog) (PA-FABP). It has reported cytogenetic 
location 7q36.1. Similarly, LOC150161 encodes a protein similar to fatty acid-binding 
protein, epidermal (E-FABP) (psoriasis-associated fatty acid-binding protein homolog) (PA- 
FABP). It is located on chromosome 22 with reported cytogenetic location 22ql 1.1. 
[0156] Furthermore, CPS 57 has about 89-93% sequence identity to BTBD1, 

LOC130962, LOC152940 and LOC204114. BTBD1 encodes BTB (POZ) domain 
containing 1. It has LocusID: 53339, and is located on chromosome 15 with reported 
cytogenetic location 15q24. The gene product contains a BTB/POZ domain, and may 
function as DNA or actin binding protein. LOCI 30962 encodes a protein similar to fatty 
acid-binding protein, epidermal (E-FABP) (psoriasis-associated fatty acid-binding protein 
homolog) (PA-FABP). The gene has reported cytogenetic location 2q23.3. Likewise, 
LOCI 52940 encodes a protein similar to unnamed protein product. It is located on 
chromosome 4 with reported cytogenetic location 4q31.3-q32.1. LOC204114 encodes a 
protein similar to fatty acid binding protein homolog. It has reported cytogenetic location 
13q31.3. 
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[0157] CPS 58 corresponds to MMP9 which encodes matrix metalloproteinase 9 

(gelatinase B, 92kD gelatinase, 92kD type IV collagenase). The gene has LocusID: 4318, 
and is located on chromosome 20 with reported cytogenetic location 20qll.2-ql3.1. 
Proteins of the matrix metalloproteinase (MMP) family are involved in the breakdown of 
extracellular matrix in normal physiological processes, such as embryonic development, 
reproduction, and tissue remodeling, as well as in disease processes, such as arthritis and 
metastasis. Most MMPs are secreted as inactive proproteins which are activated when 
cleaved by extracellular proteinases. The enzyme encoded by this gene can degrade type IV 
and V collagens. Studies in rhesus monkeys suggest that the enzyme is involved in ILr8- 
induced mobilization of hematopoietic progenitor cells from bone marrow, and murine 
studies suggest a role in tumor-associated tissue remodeling. 

[0158] CPS 59 corresponds to ATP2B1 which encodes ATPase, Ca++ transporting, 

plasma membrane 1. The gene has LocusID: 490, and is located on chromosome 12 with 
reported cytogenetic location 12q21-q23. 

[0159] Nucleotides 2623 to 2814 of SEQ ID NO: 59 (J04027) have about 81% 

sequence identity to ATP2B4 which encodes ATPase, Ca++ transporting, plasma membrane 
4. ATP2B4 has LocusID: 493, and is located on chromosome 1. Nucleotides 4365-4398 of 
SEQ ID NO: 59 has 100% sequence identity to FLJ 14075 which encodes hypothetical 
protein FLJ14075. FLJ14075 has LocusID: 79954, and is located on chromosome 2. 
[0160] CPS 60 corresponds to NEUD4 which encodes neuro-d4 (rat) homolog. The 

gene has LocusID: 8193, and is located on chromosome 19 with reported cytogenetic 
location 19ql3.13. The gene product contains at least a zinc finger DNA binding domain. 
Nucleotides 61-198 of U43843 has 86% sequence identity to CERD4 which encodes cer-d4 
(mouse) homolog. CERD4 has LocusID: 8110, and is located on chromosome 14 with 
reported cytogenetic location 14q24.3-q31.1. 

[0161] CPS 61 corresponds to CCR1 which encodes chemokine (OC motif) 

receptor 1. The gene has LocusID: 1230, and is located on chromosome 3 with reported 
cytogenetic location 3p21. The gene products is a member of the beta chemokine receptor 
family, and is predicted to be a seven transmembrane protein similar to G protein-coupled 
receptors. The ligands of this receptor include macrophage inflammatory protein 1 alpha 
(MIP-1 alpha), monocyte chemoattractant protein 3 (MCP-3), and myeloid progenitor 
inhibitory factor- 1 (MPIF-1). Signal transduction mediated by chemokines and their 
receptors is believed to be important for the recruitment of effector immune cells to the site 
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of inflammation. Knockout studies of the mouse homolog suggests the role of this gene in 
host protection from inflammatory response, and susceptibility to virus and parasite. This 
gene and other chemokine receptor genes, including CCR2, CCRL2, CCR3, CCR5 and 
CCXCR1 , are found to form a gene cluster on chromosome 3p. The protein encoded by this 
gene can bind to chemokines of the CC subfamily and mediate intracellular calcium flux. 
[0162] CPS 62 corresponds to C8FW which encodes a phosphoprotein regulated by 

mitogenic pathways. The protein is similar to protein kinases. The gene has LocusID: 
10221, and is located on chromosome 8 with reported cytogenetic location 8q24.13. 
[01631 CPS 63 corresponds to CLU which encodes clusterin (complement lysis 

inhibitor, SP-40,40, sulfated glycoprotein 2, testosterone-repressed prostate message 2, 
apolipoprotein J). The gene has LocusID: 1191, and is located on chromosome 8 with 
reported cytogenetic location 8p21-pl2. Clusterin is a glycoprotein and can be found in 
high density lipoproteins and endocrine and neuronal granules. It may have a role in the 
terminal complement reaction. 

[0164] CPS 64 corresponds to EREG which encodes epiregulin. The gene has 

LocusID: 2069, and is located on chromosome 4 with reported cytogenetic location 4ql3.3. 
Epiregulin is a member of the epidermal growth factor family. Epiregulin can function as a 
ligand of EGFR (epidermal growth factor receptor), as well as a ligand of members of the 
ERBB (v-erb-b2 oncogene homolog) family of tyrosine-kinase receptors. Epiregulin may 
promote cell proliferation. 

[0165] CPS 65 corresponds to PPAP2B which encodes phosphatidic acid 

phosphatase type 2B. The gene has LocusID: 8613, and is located on chromosome 1 with 
reported cytogenetic location lpter-p22.1. The gene product is magnesium-independent 
phosphatidic acid phosphatase 2b. It can convert phosphatidic acid to diacylglycerol. It can 
also hydrolyze lysophosphatidate, ceramide-1 -phosphate, and sphingosine- 1 -phosphate. 
[0166] CPS 66 corresponds to TUBB which encodes tubulin, beta polypeptide. The 

gene has LocusID: 7280, and is located on chromosome 6 with reported cytogenetic 
location 6p21.3. Beta tubulin can polymerize to form microtubules. It is a member of a 
family of structural proteins. 

[0167] Nucleotides 119-231 and 340-939 of SEQ ID NO: 66 (X79535) also have 

over 99% sequence identity to a genomic sequence between TUBB and LOC221753. 
LOC221753 is located on chromosome 6. 
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[0168] In addition, nucleotides 58-120 and 340-1397 of X79535 have about 98% 

sequence identity to LOC221753. LOC221753 has reported cytogenetic location 6p24.3. 
[0169] Moreover, fragments of X79535 exhibit about 82-92% sequence identity to 

certain other genes. These genes include TUBB5, TUBB4, LOC139112, LOC157586, 
LOC203068, LOC92755 and GABRR2. TUBB5 encodes tubulin, beta, 5. It has LocusID: 
10382, and is located on chromosome 19 with reported cytogenetic location 19pl3.3. 
TUBB5 gene has nucleotides 637115 to 644163 of chromosome 19. Beta 5-tubulin can 
polymerize to form microtubules. TUBB4 encodes tubulin, beta, 4. It has LocusID: 10381, 
and is located on chromosome 16 with reported cytogenetic location 16q24.3. Beta 4- 
tubulin can also polymerize to form microtubules. LOCI 391 12 encodes a protein similar to 
tubulin beta. The gene has reported cytogenetic location Xq25. LOC157586 and 
LOC203068 encode proteins similar to hypothetical protein DKFZp564N 123.1 - human 
(fragment). Both genes have reported cytogenetic location 8p21.1. LOC92755 is a 
hypothetical gene, and has reported cytogenetic location 8p21.1. GABRR2 encodes 
gamma-aminobutyric acid (GAB A) receptor, rho 2. It has LocusID: 2570 and reported 
cytogenetic location 6ql3-ql6.3. GABA is a major inhibitory neurotransmitter in fhe 
mammalian brain where it can act at GABA receptors, which are ligand-gated chloride 
channels. GABRR2 is a member of the rho subunit family. 

[0170] CPS 67 corresponds to NUP214 which encodes nucleoporin 214kD (CAIN). 

The gene has LocusID: 8021, and is located on chromosome 9 with reported cytogenetic 
location 9q34.1. Nucleoporin 214kD is a protein localized to cytoplasmic aspect of the 
nuclear pore complex. It contains FXFG repeats. 

[0171] Fragment of nucleotides 3712 to 5515 of D14689 (SEQ ID NO: 67) has 

100% sequence identity to LOCI 58306. LOCI 58306 encodes a protein similar to 
nucleoporin 214kD (CAIN), and has reported cytogenetic location 9q34.2. LOCI 58306 is 
located within an exon of NUP214 gene. 

[01721 CPS 68 corresponds to ALDH5A1 which encodes aldehyde dehydrogenase 5 

family, member Al (succinate-semialdehyde dehydrogenase). The gene has LocusID: 
7915, and is located on chromosome 6 with reported cytogenetic location 6p22. CPS 68 
aligns to nucleotides 32909278 to 32909817 of chromosome 6, and is located in the 3' 
untranslated region of ALDH5A1. Aldehyde dehydrogenase 5A1 (succinic semialdehyde 
dehydrogenase) involves 4-aminobutyric acid degradation. 
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[0173] Nucleotides 45212 to 44763 of SEQ ID NO: 68 (AL031230) have about 90% 

sequence identity to HSPCAL3 which encodes heat shock 90kD protein 1, alpha-like 3. 
HSPCAL3 gene has LocusID: 3324 and reported cytogenetic location 1 Ipl4.2-pl4.1. In 
addition, nucleotides 11858 to 12096 of AL031230 show 86% sequence identity to a 
genomic sequence on chromosome 1 . 

[0174] CPS 69 corresponds to LOC64116. The gene has LocusID: 64116, and is 

located on chromosome 4 with reported cytogenetic location 4q22-q24. The gene is up- 
regulated by BCG-CWS. 

[0175] CPS 70 corresponds to XK which encodes Kell blood group precursor 

(McLeod phenotype). The gene has LocusID: 7504, and is located on chromosome X with 
reported cytogenetic location Xp21.1. This locus controls the synthesis of the Kell blood 
group "recursor substance" Kx). Mutations in this gene have been associated with McLeod 
syndrome, an X-linked, recessive disorder characterized by abnormalities in the 
neuromuscular and hematopoietic systems. The encoded protein is a member of transporter 
family and has structural characteristics of prokaryotic and eukaryotic membrane transport 
proteins. 

[0176] CPS 71 corresponds to KIAA0837 (FACL6) which encodes long fatty acyl- 

CoA synthetase 2 gene (fatty-acid-Coenzyme A ligase, long-chain 6). The gene has 
LocusID: 23305, and is located on chromosome 5 with reported cytogenetic location 5q31. 
[0177] CPS 72 corresponds to GYPC which encodes glycophorin C (Gerbich blood 

group). The gene has LocusID: 2995, and is located on chromosome 2 with reported 
cytogenetic location 2ql4-q21. Glycophorin C (GYPC) is an integral membrane 
glycoprotein. It is a minor species carried by human erythrocytes, but plays an important 
role in regulating the mechanical stability of red cells. A number of glycophorin C 
mutations have been described. The Gerbich and Yus phenotypes are due to deletion of 
exon 3 and 2, respectively. The Webb and Duch antigens, also known as glycophorin D, 
result from single point mutations of the glycophorin C gene. The glycophorin C protein 
has homology with glycophorins A and B. 

[0178] CPS 73 corresponds to TFDP1 which encodes transcription factor Dp-1 . The 

gene has LocusID: 7027, and is located on chromosome 13 with reported cytogenetic 
location 13q34. The gene product may heterodimerize with E2F to transactivate genes 
involved in cell cycle progression from Gl to S-phase. TFDP1, CUL4A, and CDC 16 are 
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probable targets of an amplification mechanism and may be involved, together or 

separately, in development and/or progression of some hepatocellular carcinomas. 

[0179] CPS 73, as well as nucleotides 9 to 1440 of L23959 (SEQ ID NO: 73), have 

about 95% sequence identity to LOC245788 on chromosome 8. LOC245788 is reported to 

encode transcription factor DP-1 (E2F dimerization partner 1) (DRTF1 -polypeptide- 1) 

(DRTF1). 

[0180] In addition, CPS 73 has about 87-90% sequence identity to LOCI 2661 1 and 

LOC51270. LOC126611 encodes a protein similar to transcription factor DP-1 (E2F 
dimerization partner 1) (DRTF1 -polypeptide- 1) (DRTF1). It is located on chromosome 1 
with reported cytogenetic location lq31.3. LOC51270 encodes E2F-like protein which is 
similar to a region of human transcription factor Dp-1. The gene has LocusID: 51270, and 
is located on chromosome X with reported cytogenetic location Xq26.2. 
[0181] Nucleotides 1001 to 1440 of SEQ ID NO: 73 (L23959) have about 87% 

sequence identity to CD36 which encodes CD36 antigen (collagen type I receptor, 
thrombospondin receptor). The gene has LocusID: 948, and is located on chromosome 7 
with reported cytogenetic location 7qll.2. CD36 is a receptor for thrombospondin and 
collagen in platelets. It functions in cell adhesion. It has a role in platelet-collagen 
adhesion, and can bind to long chain fatty acids. The protein is strongly similar to rat FAT. 
Nucleotides 9 to 947 of SEQ ID NO: 73 have 95% sequence identity to LOC123471 which 
encodes a protein similar to transcription factor DP-1 (E2F dimerization partner 1) (DRTF1- 
polypeptide-1) (DRTF1). LOC123471 has reported cytogenetic location 15q23. 
[0182] CPS 74 corresponds to C20orfl6 which encodes chromosome 20 open 

reading frame 16. The gene has LocusID: 54498, and is located on chromosome 20 with 
reported cytogenetic location 20pl3. The protein is a member of the flavin containing 
amine oxidase family. It is weakly similar to monoamine MAOB (oxidase B). 
[0183] CPS 75 corresponds to FCAR which encodes a receptor for Fc fragment of 

IgA. The gene has LocusID: 2204, and is located on chromosome 19 with reported 
cytogenetic location 19ql3.2-ql3.4. This gene is a member of the immunoglobulin gene 
superfamily and encodes a receptor for the Fc region of IgA. The receptor is a 
transmembrane glycoprotein present on the surface of myeloid lineage cells such as 
neutrophils, monocytes, macrophages, and eosinophils, where it may mediate immunologic 
responses to pathogens. It may interact with IgA-opsonized targets and trigger several 
immunologic defense processes, including phagocytosis, antibody-dependent cell-mediated 
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cytotoxicity, and stimulation of the release of inflammatory mediators. At least ten 
transcript variants encoding different isoforms have been described for this gene. The gene 
product is also known as Fc alpha R. 

[0184] CPS 76 corresponds to ITGB3 which encodes integrin, beta 3 (platelet 

glycoprotein Ilia, antigen CD61). The gene has LocusID: 3690, and is located on 
chromosome 17 with reported cytogenetic location 17q21.32. The ITGB3 protein product 
is the integrin beta chain beta 3. Integrins are integral cell-surface proteins composed of an 
alpha chain and a beta chain. A given chain may combine with multiple partners resulting 
in different integrins. Integrin beta 3 is found along with the alpha lib chain in platelets. 
Integrins are known to participate in cell adhesion as well as cell-surface mediated 
signaling. This gene product may be involved in mediating platelet aggregation. 
[0185] CPS 77 corresponds to MXI1 which encodes MAX interacting protein. The 

gene has LocusID: 4601, and is located on chromosome 10 with reported cytogenetic 
location 10q24-q25. Expression of the c-myc gene, which produces an oncogenic 
transcription factor, is tightly regulated in normal cells but is frequently deregulated in 
human cancers. The protein encoded by this gene is a trancriptional repressor thought to 
negatively regulate MYC function, and is therefore a potential tumor suppressor. The 
protein inhibits the transcriptional activity of MYC by competing for MAX, another basic 
helix-loop-helix protein that binds to MYC and is required for its function. Defects in this 
gene are frequently found in patients with prostate tumors. Two transcript variants 
encoding different isoforms have been identified for this gene. 

[0186] Nucleotides 1 to 64 of SEQ ID NO: 77 (L07648) show 100% sequence 

identity to ARHA which encodes ras homolog gene family, member A. The gene has 
LocusID: 387, and is located on chromosome 3 with reported cytogenetic location 3p21.3. 
The gene product is a ras-related GTP binding protein of the rho subfamily, and may be 
involved in regulation of reorganization of the actin cytoskeleton. 

[0187] CPS 78 corresponds to CSDA which encodes cold shock domain protein A. 

The gene has LocusID: 8531, and is located on chromosome 12 with reported cytogenetic 
location 1 2p 1 3 . 1 . The gene product is a member of a family of transcriptional regulators. It 
can bind and repress the promoter of the (GM-CSF) gene. The gene product contains a 
cold-shock domain. 

[0188] CPS 78, as well as nucleotides 14 to 1568 of M24069 (SEQ ID NO: 78), 

show at least 94% sequence identity to LOC220558. LOC220558 also encodes cold shock 
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domain protein A or cold-shock domain protein A. It is located on chromosome 1 6 with 
reported cytogenetic location 16pl 1.1 . 

[0189] CPS 79 corresponds to OPTN (FIP2) which encodes optineurin. The gene 

has LocusID: 10133, and is located on chromosome 10 with reported cytogenetic location 
10pl2.33. The gene product is a component of a heterodimeric complex that inhibits 
cytolysis induced by tumor necrosis factor alpha. It contains leucine zippers. It is also 
known as tumor necrosis factor alpha-inducible cellular protein containing leucine zipper 
domains or Huntingtin interacting protein L. 

[0190] CPS 80 corresponds to SELENBP1 which encodes selenium binding protein 

1. The gene has LocusID: 8991, and is located on chromosome 1 with reported cytogenetic 
location Iq21-q22. This gene product belongs to the selenium-binding protein family. 
Selenium is a nutrient that exhibits potent anticarcinogenic properties, and deficiency of 
selenium may cause certain neurologic diseases. It has been proposed that the effects of 
selenium in preventing cancer and neurologic diseases may be mediated by selenium- 
binding proteins. The exact function of this gene is not known. 

[0191] CPS 81 corresponds to PPP1R2 which encodes protein phosphatase 1, 

regulatory (inhibitor) subunit 2. The gene has LocusID: 5504, and is located on 
chromosome 3 with reported cytogenetic location 3q29. Inhibitory subunit 2 of protein 
phosphatase 1 may associate with the gamma isoform of protein phosphatase 1 . 
[0192] Nucleotides 25 to 556 of SEQ ID NO: 81 (U681 11) also have 96% sequence 

identity to LOC153743. This gene encodes a protein similar to protein phosphatase 1, 
regulatory (inhibitor) subunit 2. The gene has reported cytogenetic location 5q33.2. 
[0193] In addition, nucleotides 25 to 556 of U681 1 1 have 85-90% sequence identity 

to certain other genes or genomic sequences. These genes or genomic sequences include 
PPP1R2P1, the region 3' to LOC160817, the non-coding region of LOC130957, the non- 
coding region of LOC220419, and certain regions in chromosomes 7 and 21. PPP1R2P1 
encodes protein phosphatase 1, regulatory (inhibitor) subunit 2 pseudogene 1. PPP1R2P1 
has LocusID: 5505, and is located on chromosome 6 with reported cytogenetic location 
6p21.1. LOCI 608 17 encodes a protein similar to protein phosphatase 1, regulatory 
(inhibitor) subunit 2, and has reported cytogenetic location 13q2Ll. LOCI 30957 encodes a 
protein similar to protein phosphatase 1, regulatory (inhibitor) subunit 2, and is located at 
chromosome 2ql2.1. LOC220419 is reported to encode protein phosphatase 1, regulatory 
(inhibitor) subunit 2, and is located at chromosome 13ql4.1 1. 
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[0194] CPS 82 corresponds to HPGD which encodes hydroxyprostaglandin 

dehydrogenase 15-(NAD). The gene has LocusID: 3248, and is located on chromosome 4 
with reported cytogenetic location 4q34-q35. The gene product can inactivate many 
prostaglandins by oxidation of their C-15 residues. 

[0195] CPS 83 corresponds to SLC4A1 which encodes solute carrier family 4, anion 

exchanger, member 1 (erythrocyte membrane protein band 3, Diego blood group). The 
gene has LocusID: 6521, and is located on chromosome 17 with reported cytogenetic 
location 17q21-q22. The genomic sequence aligning to CPS 83 is located 3* to the 
polypeptide-coding sequence of the gene. The gene is also known as CD233 gene. The 
gene product, also known as Band 3 anion exchanger, is part of the anion exchanger (AE) 
family. The gene product may function to maintain ion homeostasis by transporting 
chloride and bicarbonate ions. 

[0196] SEQ ID NO: 259 (M27819) also aligns to SLC4A1 with over 98% sequence 

identity, and therefore, can be used as a probe for SLC4A1. Nucleotides 2206 to 2426 of 
SEQ ID NO: 259 also show about 76% sequence identity to SLC4A2. This gene encodes 
solute carrier family 4, anion exchanger, member 2 (erythrocyte membrane protein band 3- 
like 1). The gene has LocusID: 6522. 

[0197] CPS 84 corresponds to IL17R which encodes interleukin 17 receptor. The 

gene has LocusID: 23765, and is located on chromosome 22 with reported cytogenetic 
location 22q 11.1. The gene product is highly similar to murine II 1 7r, and may play a role in 
T cell activation and induction of IL-2 (IL2). 

[0198] CPS 87 corresponds to CBFA2T3 which encodes core-binding factor, runt 

domain, alpha subunit 2; translocated to, 3. The gene has LocusID: 863, and is located on 
chromosome 16 with reported cytogenetic location 16q24. The gene product is a member 
of the MTG8 (ETO/CDR) protein family. 

[0199] CPS 89 corresponds to an intron sequence of RAPlGAl. RAP1GA1 

encodes GTPase activating protein 1 for rapl. RAPlGAl gene has LocusID: 5909, and is 
located on chromosome 1 with reported cytogenetic location Ip36.1-p35. The gene product 
is also known as KIAA0474 gene product. 

[0200J CPS 90 corresponds to BCL2L1. which encodes BCL2-like 1. The gene has 

LocusID: 598, and is located on chromosome 20 with reported cytogenetic location 
20qll.l. The protein encoded by this gene belongs to the BCL^2 protein family. BCL-2 
family members form hetero- or homodimers and act as anti- or pro-apoptotic regulators 
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that are involved in a wide variety of cellular activities. The proteins encoded by this gene 
are located at the outer mitochondrial membrane, and have been shown to regulate outer 
mitochondrial membrane channel (VDAC) opening. VDAC regulates mitochondrial 
membrane potential, and thus controls the production of reactive oxygen species and reiease 
of cytochrome C by mitochondria, both of which are the potent inducers of cell apoptosis. 
At least two alternatively spliced transcript variants, which encode distinct isoforms, have 
been reported. The longer isoform may act as an apoptotic inhibitor and the shorter form 
may act as an apoptotic activator. 

[0201] CPS 91 corresponds to COPEB which encodes core promoter element 

binding protein. The gene has LocusID: 1316, and is located on chromosome 10 with 
reported cytogenetic location 10pl5. This gene encodes a nuclear protein (core promoter 
element binding protein). This protein has three zinc fingers at the end of its Oterminal 
domain, a serine/threonine-rich central region and an acidic domain lying within the N- 
terminal region. The zinc fingers of this protein are believed to be responsible for the 
specific DNA binding with the guanine-rich core promoter elements. The central region 
might be involved in activation or posttranslational regulatory pathways, and the acidic N- 
terminal domain might play an important role in the process of transcriptional activation. 
This protein is expressed in several tissues, with the high levels in the placenta. It is a 
trancriptional activator, capable of activating transcription approximately 4-fold either on 
homologous or heterologous promoters. The DNA binding and transcriptional activity of 
this protein, in conjunction with its expression pattern, suggests that this protein may 
participate in the regulation and/or maintenance of the basal expression of pregnancy- 
specific glycoprotein gene and possibly other TATA box-less genes. The genomic 
sequence aligning to CPS 91 is located 3' to the polypepetide coding sequence of the gene. 
[0202] CPS 92 corresponds to ADM which encodes adrenomedullin. The gene has 

LocusID: 133, and is located on chromosome 11 with reported cytogenetic location 
11 pi 5.4. Adrenomedullin, a hypotensive peptide found in human pheochromocytoma, 
consists of 52 amino acids, has one intramolecular disulfide bond, and shows a slight 
homology with the calcitonin gene-related peptide. It may function as a hormone in 
circulation control because it is found in blood in a considerable concentration. The 
precursor, called preproadrenomedullin, is 185 amino acids long. By RNA-blot analysis, 
human adrenomedullin mRNA was found to be highly expressed in several tissues. 
Genomic ADM DNA consists of at least 4 exons and 3 introns, with the 5-prime flanking 
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region containing TATA, CAAT, and GC boxes. There are also multiple binding sites for 
activator protein-2 and a cAMP-regulated enhancer element. The gene also encodes the 
precursor of adrenomedullin (AM) and the putative 20 amino acid peptide proAM-N20. 
The gene product may regulate blood pressure and heart rate. 

[0203] CPS 93 corresponds to SPTB which encodes spectrin, beta, erythrocytic 

(includes spherocytosis, clinical type I). The gene has LocusID: 6710, and is located on 
chromosome 14 with reported cytogenetic location 14q23-q24.2. Beta spectrin (beta- 
fodrin) may crosslink actin proteins of the membrane-associated cytoskeleton. It is a 
member of a family of actin-cross linking proteins. 

[0204] CPS 94 corresponds to ITGA2B which encodes integrin, alpha 2b (platelet 

glycoprotein lib of Ilb/IIIa complex, antigen CD41B). The gene has LocusID: 3674, and is 
located on chromosome 17 with reported cytogenetic location 17q21.32. Integrins are 
heterodimeric integral membrane proteins composed of an alpha chain and a beta chain. 
Alpha chain 2b undergoes post-translational cleavage to yield disulfide-linked light and 
heavy chains that join with beta 3 to form a fibronectin receptor expressed in platelets that 
plays a crucial role in coagulation. Mutations that interfere with this role may result in 
thrombasthenia. In addition to adhesion, integrins are known to participate in cell-surface 
mediated signalling. The gene product can act as a receptor for fibrinogen, von Willebrand 
factor and fibronectin 

[0205] CPS 95 corresponds to CTNNAL1 which encodes catenin (cadherin- 

associated protein), alpha-like 1. The gene has LocusID: 8727, and is located on 
chromosome 9 with reported cytogenetic location 9q3 1 .2. Alpha-like 1 catenin (cadherin- 
associated protein) links cadherins to the cytoskeleton. The protein is a member of the 
catenin family of cadherin-binding proteins. 

[0206] CPS 96 corresponds to SCYA2 which encodes small inducible cytokine A2 

(monocyte chemotactic protein 1). The gene has LocusID: 6347, and is located on 
chromosome 17 with reported cytogenetic location 17qll.2-q21.1. Cytokine A2 is a 
chemotactic factor for monocytes. 

[0207] CPS 97 corresponds to NDUFB7 which encodes NADH dehydrogenase 

(ubiquinone) 1 beta subcomplex, 7 (18kD, B18). The gene has LocusID: 4713, and is 
located on chromosome 19 with reported cytogenetic location 19pl3.12-pl3.11. The gene 
product is a subunit of the NADH -ubiquinone oxidoreductase (complex I). 
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[0208] CPS 98 corresponds to SCYA7 which encodes small inducible cytokine A7 

(monocyte chemotactic protein 3). The gene has LocusID: 6354, and is located on 
chromosome 17 with reported cytogenetic location 17qll.2-ql2. This gene encodes 
monocyte chemotactic protein 3, a secreted chemokine which attracts macrophages during 
inflammation and metastasis. It is a member of the C-C subfamily of chemokines which are 
characterized by having two adjacent cysteine residues. The protein is an in vivo substrate 
of matrix metalloproteinase 2, an enzyme which degrades components of the extracellular 
matrix. SCYA7 is part of a cluster of C-C chemokine family members on chromosome 17q. 
[0209] Nucleotides 1 to 246 of SEQ ID NO: 95 (X72308) have about 95% sequence 

identity to at least two other genomic sequences. The first genomic sequence is located 
between the polypeptide-coding sequences of AMPD3 and ZFP26. The second genomic 
sequence is located near LOCI 391 70. AMPD3 encodes adenosine monophosphate 
deaminase (isoform E), and has LocusID: 272. The gene is located at chromosome 1 lpl5. 
ZFP26 encodes C3HC4-like zinc finger protein, and has LocusID: 50862. The gene is 
located at chromosome llpl5.3. LOC139170 encodes a protein similar to KIAA1892 
protein, and is located at chromosome Xq25. 

[0210] CPS 99 corresponds to FCGR1 A which encodes Fc fragment of IgG, high 

affinity la, receptor for (CD64). The gene has LocusID: 2209, and is located on 
chromosome 1 with reported cytogenetic location Iq21.2-q21.3. The gene product has a 
role in immune response, and is a member of the immunoglobulin superfamily. 
[0211] CPS 100 corresponds to EPB49 which encodes erythrocyte membrane 

protein band 4.9 (dematin). The gene has LocusID: 2039, and is located on chromosome 8 
with reported cytogenetic location 8p21.1. Dematin may bind to actin. It is a member of 
the villin family of actin-bundling proteins. 

[0212] CPS 101 corresponds to DD96 which encodes epithelial protein up-regulated 

in carcinoma, membrane associated protein 17. The gene has LocusID: 10158, and is 
located on chromosome 1 with reported cytogenetic location lp33. The gene is reported to 
be up-regulated in malignant epithelial cells of renal cell carcinomas, as well as in 
carcinomas of colon, breast and lung. 

[0213] Nucleotides 1 to 87 of SEQ ID NO: 98 (U21049) show about 98% sequence 

identity to LOC222094. LOC222094 encodes cell division cycle 2-like 5 (isoform 1), 
cholinesterase-related cell division controller, ajid CDC2-related protein kinase 5. It is 
located at chromosome 7pl 5.2. 
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[0214] CPS 102 corresponds to PPARG which encodes peroxisome proliferative 

activated receptor, gamma. The gene has LocusID: 5468, and is located on chromosome 3 
with reported cytogenetic location 3p25. The protein encoded by this gene is a member of 
the peroxisome proliferator-activated receptor (PPAR) subfamily of nuclear receptors. 
PPARs form heterodimers with retinoid X receptors (RXRs) and these heterodimers 
regulate transcription of various genes. Three subtypes of PPARs are known: PPAR-alpha, 
PPAR-delta, and PPAR-gamma. The protein encoded by this gene is PPAR-gamma and is 
a regulator of adipocyte differentiation. Additionally, PPAR-gamma has been implicated in 
the pathology of numerous diseases including obesity, diabetes, atherosclerosis and cancer. 
Multiple transcript variants that use alternate promoters and splicing have been identified 
for this gene. At least three of these variants encode the same isoform. 

[0215] Nucleotides 1 to 77 of SEQ ID NO: 99 (L40904) have 100% sequence 

identity to HBA2. HBA2 encodes hemoglobin, alpha 2, and has LocusID: 3040. The gene 
is located at chromosome 16 with reported cytogenetic location 16pl3.3. 
[0216] Affymetrix annotation suggests that CPS 103 corresponds to SPINK1. Blast 

search against the Entrez human genome database shows that CPS 103 also aligns to a 
genomic sequence between SCGB3A2 and KIAA0555 with at least 97% sequence identity. 
SCGB3A2 encodes secretoglobin, family 3A, member 2. SCGB3A2 and KIAA0555 are 
located at chromosome 5q32. 

[0217] CPS 104 corresponds to PLAUR which encodes plasminogen activator, 

urokinase receptor. The gene has LocusID: 5329, and is located on chromosome 19 with 
reported cytogenetic location 19ql3. The gene product, urokinase-type plasminogen 
activator receptor, may function in pericellular plasminogen activation. 
[0218] CPS 105 corresponds to CDC34 which encodes cell division cycle 34. The 

gene has LocusID: 997, and is located on chromosome 19 with reported cytogenetic 
location 19pl3.3. The protein encoded by this gene is a member of the ubiquitin- 
conjugating enzyme family. Ubiquitin-conjugating enzyme catalyzes the covalent 
attachment of ubiquitin to other proteins. CDC34 gene product may be a part of the large 
multiprotein complex, which is involved in ubiquitin-mediated degradation of cell cycle Gl 
regulators and the initiation of DNA replication. The gene product is similar to S. 
cerevisiae Cdc34p, and may covalently attach ubiquitin to substrate proteins. 
[0219] CPS 106 corresponds to UNK_AI732885 which shows 100% sequence 

identity with an intron sequence of CG005. CG005 encodes a hypothetical protein from 
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BCRA2 region. CG005 gene has LocusID: 10443, and is located on chromosome 13 with 
reported cytogenetic location 13ql2-ql3. The gene product contains a region having low 
similarity to a region of rat 2 f ,3 -cyclic nucleotide 3 -phosphodiesterase. 
[0220] CPS 107 corresponds to IL10RA which encodes interleukin 10 receptor, 

alpha. The gene has LocusID: 3587, and is located on chromosome 11 with reported 
cytogenetic location llq23. Nucleotides 3467 to 3496 of U00672 have 100% sequence 
identity to LOC200074 which is located at chromosome lp34.3. 

[0221] CPS 108 corresponds to FBX07 (FBX7) which encodes F-box only protein 

7. The gene has LocusID: 25793, and is located on chromosome 22 with reported 
cytogenetic location 22ql2-ql3. This gene encodes a member of the F-box protein family 
which is characterized by an approximately 40 amino acid motif, the F-box. The F-box 
proteins constitute one of the four subunits of the ubiquitin protein ligase complex called 
SCFs (SKPl-cullin-F -box), which functions in phosphorylation-dependent ubiquitination. 
The F-box proteins are divided into 3 classes: Fbws containing WD-40 domains, Fbls 
containing leucine-rich repeats, and Fbxs containing either different protein-protein 
interaction modules or no recognizable motifs. The protein encoded by FBX07 belongs to 
the Fbxs class and it may play a role in regulation of hematopoiesis. Alternatively spliced 
transcript variants of this gene have been reported, but the full length nature of the variants 
has not been defined. 

[0222] CPS 109 corresponds to IFIT4 which encodes interferon-induced protein 

with tetratricopeptide repeats 4. The gene has LocusID: 3437, and is located on 
chromosome 10 with reported cytogenetic location 10q24. 

[0223] CPS 110 corresponds to BAX which encodes BCL2-associated X protein. 

The gene has LocusID: 581, and is located on chromosome 19 with reported cytogenetic 
location 19ql3.3-ql3.4. The protein encoded by this gene belongs to the BCL2 protein 
family. BCL2 family members form hetero- or homodimers and act as anti- or pro- 
apoptotic regulators that are involved in a wide variety of cellular activities. BAX gene 
product forms a heterodimer with BCL2, and may function as an apoptotic activator. This 
gene product is reported to interact with, and increase the opening of, the mitochondrial 
voltage-dependent anion channel (VDAC), which leads to the loss in membrane potential 
and the release of cytochrome c. The expression of this gene is regulated by the tumor 
suppressor P53 and has been shown to be involved in P53-mediated apoptosis. Six 
alternatively spliced transcript variants, which encode different isoforms, have been 
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reported for this gene. The gene product may induce caspase activation by increasing 
mitochondrial permeability, and may function in cooperation with the adenine nucleotide 
translocator (ANT). 

[0224] CPS 1 1 1 corresponds to BSG which encodes basigin (OK blood group). The 

gene has LocusID: 682, and is located on chromosome 19 with reported cytogenetic 
location 19pl3.3. Basigin (also known as tumor cell-derived collagenase stimulatory factor, 
extracellular matrix metalloproteinase inducer, M6 antigen) may stimulate matrix 
metalloproteinase synthesis in fibroblasts. It is a member of the immunoglobulin 
superfamily. 

[0225] CPS 111 also aligns to LOC199717 with over 97% sequence identity. 

LOCI 997 17 encodes a protein similar to basigin. LOCI 997 17 is located on chromosome 
1 9 with reported cytogenetic location 1 9p 1 3 .3 . 

[0226] CPS 112 corresponds to THBS1 which encodes thrombospondin 1. The 

gene has LocusID: 7057, and is located on chromosome 15 with reported cytogenetic 
location 15ql5. Thrombospondin- 1 may have a role in blood clotting and in angiogenesis. 
It is a member of a family of adhesive molecules. 

[0227] CPS 113 corresponds to AP1G2 (G2AD) which encodes adaptor-related 

protein complex 1, gamma 2 subunit. The gene has LocusID: 8906, and is located on 
chromosome 14 with reported cytogenetic location 14qll.2. Adaptins are important 
components of clathrin-coated vesicles transporting ligand-receptor complexes from the 
plasma membrane or from the trans-Golgi network to lysosomes. The adaptin family of 
proteins is composed of four classes of molecules named alpha, beta-, beta prime- and 
gamma- adaptins. Adaptins, together with medium and small subunite, form a 
heterotetrameric complex called an adaptor, whose role may be to promote the formation of 
clathrin-coated pits and vesicles. The protein encoded by this gene is a gammaradaptin 
protein which belongs to the adaptor complexes large subunits family. Gamma-adaptin is 
thought to function at some trafficking step in the complex pathways between the trans- 
Golgi network and the cell surface. There are two alternatively spliced transcript variants of 
this gene encoding the same protein. The gene product can interact with beta-1 adaptin and 
sigma 1 chain of the AP-1 complex. 

[0228] CPS 115 corresponds to RALBP1 which encodes ralA binding protein 1. 

The gene has LocusID: 10928, and is located on chromosome 18 with reported cytogenetic 
location 18pl 1.3. RalA binding protein 1 can interact with the activated Ral. 
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[0229] CPS 115 also aligns to KIAA1634 with about 99% sequence identity. 

KIAA1634 encodes KIAA1634 protein, and is located at chromosome lpl2-pll.2. In 
addition, CPS 115 shows about 89-92% sequence identity to LOC129522, LOC131054 and 
a genomic sequence on chromosome 2. LOCI 29522 encodes a protein similar to ralA 
binding protein 1, and is located at chromosome 2qll.2. LOCI 3 1054 encodes a protein 
similar to ralA binding protein 1, and is located at chromosome 3q27.2. Nucleotides 3565 
to 3875 of L42542 have 94% sequence identity to a chromosome-6 genomic sequence 
which is located near the polypeptide^coding sequence of LOC221511. LOC221511 
encodes MHC class II DP3-alpha, and is located at chromosome 6p21.2. 
[0230] CPS 1 16 corresponds to UNK_AF070587 which is located in an intron of the 

putative gene LOC196932. LOC196932 gene encodes a protein similar to hypothetical 
protein LOC55580. LOCI 96932 is located on chromosome 14 with reported cytogenetic 
location 14q32.12. 

[0231] Affymetrix annotation suggests that CPS 117 corresponds to DUX1. Blast 

search against the Entrez human genome database shows that CPS 117 also aligns to 
LOC200133 and LOC131 1 15 with about 82-86% sequence identity. LOC200133 encodes a 
protein similar to double homeobox, 4 (double homeobox protein 4). It is located at 
chromosome lp31.3. LOC131115 encodes a protein similar to double homeobox protein, 
and is located at chromosome 3pl4.1. 

[0232] Nucleotides 1 to 698 of SEQ ID NO; 113 (AJ001481) show about 88% 

sequence identity to DUX4, LOC201498, a genomic sequence near LOC131308, and a 
genomic sequence near hypothetical gene LOCI 32684. DUX4 encodes double homeobox, 
4. It has LocusID: 22947, and is located on chromosome 4 with reported cytogenetic 
location 4q3 5. LOC201498 encodes a protein similar to FSHD Region Gene 2 protein, and 
is located on chromosome 18. LOC131308 encodes a protein similar to FSHD Region 
Gene 2 protein, and is located at chromosome 3pl4.1. LOCI 32684 is located at 
chromosome 4q35.2. 

[0233] CPS 118 corresponds to SLC6A8 which encodes solute carrier family 6 

(neurotransmitter transporter, creatine), member 8. The gene has LocusID: 6535, and is 
located on chromosome X with reported cytogenetic location Xq28. The gene product is a 
sodium and chloride-dependent creatine transporter. It is a member of neurotransmitter 
transporter family. 
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[0234] CPS 118 also has about 95% sequence identity to a genomic region on 

chromosome 16. This region includes or overlaps genes LOCI 621 51 and LOCI 46488. 
LOCI 46488 encodes a protein similar to disintegrin-like testicular metalloproteinase (EC 
3.4.24.-) IVb - crab-eating macaque (fragment). The region has reported cytogenetic 
location 16pll.l. In addition, CPS 118 has about 95% sequence identity to a genomic 
sequence which includes or overlaps putative genes LOC204478 and LOCI 46493. 
LOCI 46493 encodes a protein similar to sodium- and chloride-dependent creatine 
transporter 2 (CT2). 

[0235] Nucleotides 13923 to 14462 of SEQ ID NO: 1 14 (U36341) have about 94% 

sequence identity to a chromosomal region which is located 5' to CTAG2 and 3' to GAB3. 
CTAG2 encodes cancer/testis antigen 2, and has LocusID: 30848. It is located at 
chromosome Xq28. GAB 3 encodes GRB2-associated binding protein 3, and has LocusID: 
139716. It is also located at chromosome Xq28. 

[0236] CPS 119 corresponds to THBD which encodes thrombomodulin. The gene 

has LocusID: 7056, and is located on chromosome 20 with reported cytogenetic location 
20pl2-cen. Thrombomodulin can change the procoagulant thrombin into an anticoagulant. 
[0237] Nucleotides 3867 to 4212 of SEQ ID NO: 1 15 (J02973) align to a genomic 

sequence on chromosome 2 with 97% sequence identity. The genomic sequence is located 
between LOC200422, which encodes a protein similar to somatostatin receptor, and 
LOC205172. Both LOC200422 and LOC205172 have reported cytogenetic location 2pl2. 
[0238] Blast search against the Entrez human genome database shows that SEQ ID 

NO: 116 (CPS 120) has about 99% sequence identity to the protein-coding strand of 
LOC203068 which encodes a protein similar to tubulin, beta 5. LOC203068 is located on 
chromosome 6. In addition, SEQ ID NO: 116 has at least 99% sequence identity with 
LOC157586 and LOC157584. LOC157586 and LOC157584 encode proteins similar to 
hypothetical protein DKFZp564N123.1 (human fragment). Both LOC157586 and 
LOC157584 are located on chromosome 6. SEQ ID NO: 116 (AF141349) also has 97% 
sequence identity with the protein-coding strand of LOC92755. LOC92755 is located at 
chromosome 8p2 1.1. 

[0239] Nucleotides 14 to 1586 of SEQ ID NO: 116 have 91% sequence identity to 

LOC222017 which is located at chromosome 7pl4.1. Nucleotides 15 to 1572 of SEQ ID 
NO: 1 16 have 87% sequence identity to an intron sequence of SCP2. SCP2 encodes sterol 
carrier protein 2, and has LocusID: 6342. It is located at chromosome lp32. Sterol carrier 
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protein 2 may have a role in regulation of steroidogenesis. Moreover, nucleotides 439 to 
1474 of SEQ ID NO: 116 share 85% sequence identity to TUBB5 which encodes tubulin, 
beta, 5. TUBB5 has LocusID: 10382, and is located at chromosome 19pl3.3. Beta 5- 
tubulin can polymerize to form microtubules, and it is a member of a family of structural 
proteins. Nucleotides 421 to 1444 of SEQ ID NO: 116 also have 84% sequence identity to 
TUBB4. TUBB4 encodes tubulin, beta, 4, and has LocusID: 10381. It is located at 
chromosome 16q24.3. Nucleotides 142 to 1474 of SEQ ID NO: 116 align to LOC139112 
with 80% sequence identity. LOC139112 encodes a protein similar to tubulin beta, and is 
located at chromosome Xq25. 

[0240] CPS 123 corresponds to HBE1 which encodes hemoglobin, epsilon 1. The 

gene has LocusID: 3046, and is located on chromosome 11 with reported cytogenetic 
location 11 pi 5. 5. The epsilon globin gene (HBE) is expressed in the embryonic yolk sac. 
Two epsilon chains together with two zeta chains (an alpharlike globin) constitute the 
embryonic hemoglobin Hb Gower I, and two epsilon chains together with two alpha chains 
form the embryonic Hb Gower II. Both of these embryonic hemoglobins are normally 
supplanted by fetal, and later, adult hemoglobin. The five beta-like globin genes are found 
within a 45 kb cluster on chromosome 1 1 in the following order: 5 '-epsilon — G-gamma — 
A-gamma delta — beta-3\ Hemoglobin epsilon 1 (embryonic beta-like) can transport 
oxygen and carbon dioxide between the lung and tissues, and modulate erythrocyte 
metabolism and senescence. 

[0241] CPS 125 corresponds to MAD which encodes MAX dimerization protein. 

The gene has LocusID: 4084, and is located on chromosome 2 with reported cytogenetic 
location 2pl3-pl2. MAX dimerization protein belongs to a subfamily of MAX-interacting 
proteins. MAD gene product competes with MYC for binding to MAX to form a sequence- 
specific DNA-binding complex. MAD gene product may act as a transcriptional repressor 
while MYC appears to function as an activator. MAD gene product is a candidate tumor 
suppressor gene. The gene product is a basic helix-loop-helix, leucine zipper protein that 
dimerizes with MAX, and can form a heterodimer with MAX and repress transcription. 
The gene product may also antagonize c-Myc (MYC) and promote cellular differentiation. 
[0242] CPS 126 corresponds to TSPAN-5 which encodes tetraspan 5. The gene has 

LocusID: 10098, and is located on chromosome 4 with reported cytogenetic location 4q23. 
The protein encoded by this gene is a member of the transmembrane 4 superfamily, also 
known as the tetraspanin family. A lot of members in the superfamily are cell-surface 



W288761.2 



90 



proteins that are characterized by the presence of four hydrophobic domains. These 
proteins may mediate signal transduction events involved in the regulation of cell 
development, activation, growth and motility. 

[0243] CPS 127 corresponds to BAG1 which encodes BCL2-associated athanogene. 

The gene has LocusID: 573, and is located on chromosome 9 with reported cytogenetic 
location 9pl2. The oncogene BCL2 is a membrane protein that blocks a step in a pathway 
leading to apoptosis or programmed cell death. The BAG1 protein binds to BCL2 and is 
referred to as BCL2-associated athanogene. BAG1 enhances the anti-apoptotic effects of 
BCL2 and represents a link between growth factor receptors and anti-apoptotic 
mechanisms. BAG1 interacts with both the hepatocyte growth factor receptor and the 
platelet-derived growth factor receptor and, in both cases, enhances growth factor-mediated 
protection from apoptosis. At least three proteins, BAG-1L, BAG-1M and BAG-1, are 
encoded by the BAG-1 mRNA through the use of alternative translation initiation sites. 
[0244] Nucleotides 454 to 1006 of SEQ ID NO: 120 (Z35491) have 88% sequence 

identity to a chromosomal region on chromosome X. In addition, nucleotides 517 to 646 of 
SEQ ID NO: 120 align to LOC205900 with 100% sequence identity. LOC205900 encodes 
a protein similar to serine protease inhibitor Kazal-type 4 precursor (Peptide PEC-60 
homolog). LOC205900 is located on chromosome 4. 

[0245] CPS 128 corresponds to PADI2 (PDI2) which encodes peptidyl arginine 

deiminase, type II. The gene has LocusID: 11240, and is located on chromosome 1 with 
reported cytogenetic location Ip35.2-p35.1. The gene product is similar to rat skeletal 
muscle peptidyl arginine deiminase, type II, and may convert arginine residues within 
proteins to citrulline residues. 

[0246] Nucleotides 3315 to 4119 of SEQ ID NO: 121 (AB023211) align with 

PRKG1 with 79% sequence identity. PRKG1 encodes protein kinase, cGMP-dependent, 
type I, and has LocusID: 5592. Type I cGMP-dependent protein kinase may relax vascular 
smooth muscle and inhibit platelet aggregation. The gene is located at chromosome 
10qll.2. Nucleotides 1375 to 1500 of SEQ ID NO: 121 have 85% sequence identity with 
PADI1 which encodes peptidyl arginine deiminase, type I. PADI1 has LocusID: 29943, 
and is located on chromosome 1 with reported cytogenetic location lp36.13. 
[0247] CPS 129 corresponds to IL1R1 which encodes interleukin 1 receptor, type I. 

The gene has LocusID: 3554, and is located on chromosome 2 with reported cytogenetic 
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location 2ql2. Type I interleukin-1 receptor can bind all three forms of interleukin-1 
(ILIA, IL1B, and IL1RN). The protein contains immunoglobulin domains. 
[0248] CPS 130 corresponds to NP which encodes nucleoside phosphorylase. The 

gene has LocusID: 4860, and is located on chromosome 14 with reported cytogenetic 
location 14ql3.1. NP encodes the enzyme purine nucleoside phosphorylase. The encoded 
protein, together with adenosine deaminase (ADA), serves a key role in purine catabolism, 
which is referred to as the salvage pathway. Mutations in the encoded protein may result in 
a severe combined immunodeficiency (SCID). 

[0249] CPS 131 corresponds to the 3' untranslated region of AQP3 which encodes 

aquaporin 3. The gene has LocusID: 360, and is located on chromosome 9 with reported 
cytogenetic location 9pl3. CPS 131 is located in the 3' untranslated region of AQP3. 
Aquaporin 3 is a water channel protein. Aquaporins are a family of small integral 
membrane proteins related to the major intrinsic protein (MIP or AQPO). Aquaporin 3 is 
localized at the basal lateral membranes of collecting duct cells in the kidney. In addition to 
its water channel function, aquaporin 3 has been found to facilitate the transport of nonionic 
small solutes such as urea and glycerol, but to a smaller degree. It has been suggested that 
water channels can be functionally heterogeneous and possess water and solute permeation 
mechanisms. 

[0250] CPS 132 corresponds to GSPT1 which encodes Gl to S phase transition 1. 

The gene has LocusID: 2935, and is located on chromosome 16 with reported cytogenetic 
location 16pl3.1. The gene product is a GTP-binding protein, and has GTP -binding 
activity. The product is similar to polypeptide chain elongation factor EF1 alpha (EEF1 Al) 
and may have a role in Gl to S phase transition. 

[0251] CPS 132 has about 85% sequence identity with LOC120337. LOC120337 

encodes a protein similar to Gl to S phase transition protein 1 homolog (GTP-binding 
protein GST1-HS). LOC120337 is located at chromosome llq22.3. Nucleotides 2301 to 
2587 of X17644 align with a genomics sequence located 5' to GNB2 with sequence identity 
of 83%. GNB2 encodes guanine nucleotide binding protein (G protein), beta polypeptide 2. 
GNB2 has LocusID: 2783, and is located on chromosome 7 with reported cytogenetic 
location 7q22. Nucleotides 291 to 576 and 585 to 2494 of SEQ ID NO: 125 (X17644) have 
82-87% sequence identity with GSPT2 which encodes Gl to S phase transition 2. GSPT2 
has LocusID: 23708, and is located on chromosome 5. Nucleotides 2522 to 2587 of SEQ 
ID NO: 125 have 93% sequence identity with an intron sequence of LOCI 53643. 
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LOCI 53643 encodes a protein similar to hypothetical protein FLJ 14957, and is located at 
chromosome 5q2 1.1. 

[0252] CPS 133 corresponds to GABARAPL2 (GEF-2) which encodes GABA(A) 

receptor-associated protein-like 2. The gene has LocusID: 11345, and is located on 
chromosome 16 with reported cytogenetic location 16q22.3-q24.1. The gene product is a 
phosphoprotein and contains putative actin and nucleotide binding sites. The alternative 
names for the gene product include GEF2 or ganglioside expression factor 2. 
[0253] CPS 133 also has about 81-82% sequence identity with a genomic sequence 

located 3' to LOC206774, and an intron sequence of RAB3-GAP150. LOC206774 is 
located at chromosome 8q24.12. RAB3-GAP150 encodes the non-catalytic subunit 
(150kD) of the rab3 GTPase-activating protein. RAB3-GAP150 has, LocusID: 25782, and 
is located at chromosome lq42.12. Nucleotides 26 to 253 of SEQ ID NO: 126 (AI565760) 
have about 84% sequence identity with an intron sequence of ACCN1. ACCN1 encodes 
amiloride-sensitive cation channel 1, neuronal (degenerin). ACCN1 has LocusID: 40, and 
is located at chromosome 17ql 1.2-ql2. 

[0254] CPS 134 corresponds to HBD which encodes hemoglobin, delta. The gene is 

located on chromosome 11 with reported cytogenetic location lip 15. 5. The gene has 
LocusID: 3043. HBB, which encodes hemoglobin, beta, is also located in this 
chromosomal region. The alpha (HBA) and beta (HBB) loci determine the structure of the 
2 types of polypeptide chains in adult hemoglobin, Hb A. The normal adult hemoglobin 
tetramer consists of two alpha chains and two beta chains. Mutant beta globin causes sickle 
cell anemia. Absence of beta chain causes beta-zero-thalassemia. Reduced amounts of 
detectable beta globin causes beta-plus-thalassemia. The order of the genes in the beta- 
globin cluster is 5'-epsilon — gamma-G — gamma- A — delta — beta-3\ 

[0255] A fragment of CPS 134 (nucleotides 2 to 366 of SEQ ID NO: 127) aligns to 

HBB with 93-96% sequence identity. Moreover, another fragment of CPS 134 (nucleotides 
157 to 364 of SEQ ID NO: 127) has 80% sequence identity with HBE1. HBE1 encodes 
hemoglobin, epsilon 1. It has LocusID: 3046, and is located at chromosome 1 lpl5.5. 
[0256] CPS 135 corresponds to HAGH which encodes hydroxyacyl glutathione 

hydrolase. The gene has LocusID: 3029, and is located on chromosome 16 with reported 
cytogenetic location 16pl3.3. The enzyme encoded by this gene is classified as a 
thiolesterase and is responsible for the hydrolysis of S-lactoyl-glutathione to reduced 
glutathione and D-lactate. 
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[0257] CPS 136 corresponds to ERN1 which encodes ER to nucleus signalling 1. 

The gene has LocusID: 2081, and is located on chromosome 17. The gene product is a 
human homolog of the yeast Irel gene product. The ERN1 protein is important in altering 
gene expression as a response to endoplasmic reticulum based stress signals. The ERN1 
protein is a transmembrane endoplasmic reticulum protein, and may act as a sensor of the 
unfolded protein response pathway. 

[0258] Nucleotides 1504 to 1536 of SEQ ID NO: 129 (AF059198) have 96% 

sequence identity with a chromosomal region on chromosome 3. The region is near 
LOCI 52282 which encodes a protein similar to homeobox protein goosecoid. LOCI 5228 
is located at chromosome 3p25. 1 . 

[0259] CPS 137 corresponds to COL9A1 which encodes collagen, type IX, alpha 1. 

The gene has LocusID: 1297, and is located on chromosome 6 with reported cytogenetic 
location 6ql2-ql4. This gene encodes one of the three alpha chains of type IX collagen, a 
major collagen component of hyaline cartilage. Type IX collagen is usually found in tissues 
containing type II collagen, a fibrillar collagen. Studies in knockout mice have shown that 
synthesis of the alpha 1 chain is essential for assembly of type IX collagen molecules, a 
heterotrimeric molecule, and that lack of type IX collagen is associated with early onset 
osteoarthritis. Mutations in this gene may be associated with multiple epiphyseal dysplasia. 
Two transcript variants have been identified for this gene. 

[0260] CPS 138 corresponds to S100A11 which encodes S10O calcium binding 

protein All (calgizzarin). The gene has LocusID: 6282, and is located on chromosome 1 
with reported cytogenetic location lq21. The protein encoded by this gene is a member of 
the SI 00 family of proteins containing 2 EF-hand calcium-binding motifs. SI 00 proteins 
are localized in the cytoplasm and/or nucleus of a wide range of cells, and may be involved 
in the regulation of a number of cellular processes such as cell cycle progression and 
differentiation. SI 00 genes include at least 13 members which are located as a cluster on 
chromosome lq21. S100A1 1 gene product may function in motility, invasion, and tubulin 
polymerization. Chromosomal rearrangements and altered expression of S100A11 have 
been implicated in tumor metastasis. Alternative splicing of the 5' UTR of S100A1 1 results 
in two gene products. 

[0261] CPS 138 also has about 88-90% sequence identity with S100A14, 

LOC222128, LOC202763 and a genomic sequence containing LOC221948. S100A14 
encodes S100 calcium binding protein A14 (calgizzarin). S100A14 has LocusID: 30013, 
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and is located at chromosome 7q22-q31.1. S100A14 gene product is similar to human 
calgranulin C protein, and may belong to SI 00 protein family. LOC222128 encodes protein 
dpy-19 5 and is located at chromosome 7pl5.3. LOC221948 encodes calgizzarin (S100C 
protein) (MLN 70), and is located at chromosome 7p22.3. LOC202763 encodes a protein 
similar to protein dpy-19, and is located on chromosome 17. Nucleotides 103 to 149 of 
SEQ ID NO: 131 (D38583) align with a genomic sequence on chromosome X with over 
90% sequence identity. 

[0262] CPS 139 corresponds to FKBP1B which encodes FK506 binding protein IB 

(12.6 kD). The gene has LocusID: 2281, and is located on chromosome 2 with reported 
cytogenetic location 2p23.3. The protein encoded by this gene is a member of the 
immunophilin protein family. This family of proteins may play a role in immunoregulation 
and basic cellular processes involving protein folding and trafficking. FKBP1B gene 
product is a cis-trans prolyl isomerase that can bind the immunosuppressants FK506 and 
rapamycin. It is similar to the FK506-binding protein 1 A. Its physiological role is thought 
to be in the excitation-contraction coupling in cardiac muscle. There are at least two 
alternatively spliced transcript variants of this gene encoding different isoforms. 
[0263] CPS 139 also has about 83% sequence identity with an intron sequence of 

LOC145581. LOC145581 encodes a protein similar to hypothetical protein MGC2656, and 
is located at chromosome 14ql3.3. 

[0264] CPS 141 corresponds to RNAH which encodes RNA helicase family. The 

gene has LocusID: 10973, and is located on chromosome 6 with reported cytogenetic 
location 6ql6. CPS 141 is located in the 3' untranslated region of the gene. 
[0265] CPS 142 corresponds to MYL9 (MYRL2) which encodes myosin, light 

polypeptide 9, regulatory. The gene has LocusID: 10398, and is located on chromosome 20 
with reported cytogenetic location 20qll.22. The gene product is also known as myosin 
regulatory light chain 2. The gene product may regulate ATPase activity of myosin heads, 
and is a member of a protein family that regulates myosin activity. 

[0266] CPS 143 corresponds to SPOP which encodes speckle-type POZ protein. 

The gene has LocusID: 8405, and is located on chromosome 17 with reported cytogenetic 
location 17q22. The gene product is an autoantigenic protein and may be a DNA or actin 
binding protein. The product contains a POZ domain, and may mediate protein-protein 
interactions. 
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[0267] CPS 144 corresponds to the 3' untranslated region of SLC11A1 which 

encodes solute carrier family 1 1 (proton-coupled divalent metal ion transporters), member 
1. The gene has LocusID: 6556, and is located on chromosome 2 with reported cytogenetic 
location 2q35. The gene product is similar to murine Beg (Nrampl), and may control 
antimicrobial activity of macrophages. 

[0268] CPS 145 corresponds to SIAH2 which encodes seven in absentia homolog 2 

(Drosophila). The gene has LocusID: 6478, and is located on chromosome 3 with reported 
cytogenetic location 3q25. The gene product may be a negative regulator of Vav and DCC 
mediated signaling pathways. 

[0269] CPS 146 corresponds to SI OOP which encodes SI 00 calcium binding protein 

P. The gene has LocusID: 6286, and is located on chromosome 4 with reported cytogenetic 
location 4pl6. The protein encoded by this gene is a member of the S100 family of proteins 
containing 2 EF-hand calcium-binding motifs. SI 00 proteins are localized in the cytoplasm 
and/or nucleus of a wide range of cells, and involved in the regulation of a number of 
cellular processes such as cell cycle progression and differentiation. SI 00 genes include at 
least 13 members which are located as a cluster on chromosome lq21. However, SI OOP is 
located at chromosome 4pl6. SI OOP protein, in addition to binding Ca2+, also binds Zn2+ 
and Mg2+. This protein may play a role in the etiology of prostate cancer. 
[0270] CPS 147 corresponds to TNNT1 which encodes troponin Tl, skeletal, slow. 

The gene has LocusID: 7138, and is located on chromosome 19 with reported cytogenetic 
location 19ql3.4. The gene product is also known as troponin Tl, tropomyosin-binding 
subunit of troponin, or slow twitch skeletal muscle regulatory protein. 

[0271] Nucleotides 15639 to 15571 of SEQ ID NO: 139 (AJ011712) have 84% 

sequence identity with a chromosomal region at 4q32.3. Nucleotides 15562 to 15604 of 
SEQ ID NO: 139 have 93% sequence identity with a chromosomal region near TRAF6. 
TRAF6 encodes TNF receptor-associated factor 6, and has LocusID: 7189. TRAF6 is 
located at chromosome 1 1 p 1 1 .2. 

[0272] CPS 148 corresponds to KIAA0750 which encodes KIAA0750 gene product. 

The gene has LocusID: 9645, and is located on chromosome 1 1 with reported cytogenetic 
location lip 15.2. 

[0273] CPS 149 corresponds to FOS which encodes v-fos FBJ murine osteosarcoma 

viral oncogene homolog. The gene has LocusID: 2353, and is located on chromosome 14 
with reported cytogenetic location 14q24.3. The Fos gene family consists of at least four 
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members: FOS, FOSB, FOSL1, and F0SL2. These genes encode leucine zipper proteins 
that can dimerize with proteins of the JUN family, thereby forming the transcription factor 
complex AP-1. As such, the FOS proteins have been implicated as regulators of cell 
proliferation, differentiation, and transformation. In some cases, expression of the FOS 
gene has been associated with apoptotic cell death. FOS gene product may function as a 
transcription factor. It may also be involved in regulation of DNA methylation. The 
chromosomal region that aligns with CPS 149 also contains LOC196923. LOC196923 
encodes a protein similar to proto-oncogene protein c-fos (cellular oncogene fos) (G0/G1 
switch regulatory protein 7). 

[0274] Nucleotides 1 to 6210 of SEQ ID NO: 141 (K00650) also align with a 

chromosomal region on chromosome 14 with 99% sequence identity. This chromosomal 
region includes LOC196937, LOC196936 and LOC196935. All of these three putative 
genes have reported cytogenetic location 14q23.2. LOC196936 encodes a protein similar to 
proto-oncogene protein c-fos (cellular oncogene fos) (G0/G1 switch regulatory protein 7). 
LOO 19693 5 encodes a protein similar to proto-oncogene protein c-fos (cellular oncogene 
fos) (G0/G1 switch regulatory protein 7). 

[0275] CPS 150 corresponds to SERPINB2 (PAI2) which encodes serine (or 

cysteine) proteinase inhibitor, clade B (ovalbumin), member 2. The gene has LocusID: 
5055, and is located on chromosome 18 with reported cytogenetic location 18q21.3. The 
gene product is known as plasminogen activator inhibitor, type II (arginine-serpin). It is a 
member of the serpin family of serine protease inhibitors. Alternative names for this gene 
product include PAI or PLANH2. 

[0276] CPS 151 corresponds to PDXK which encodes pyridoxal (pyridoxine, 

vitamin B6) kinase. The gene has LocusID: 8566, and is located on chromosome 21 with 
reported cytogenetic location 2 1 q22.3 . 

[0277] CPS 152 can be derived from homo sapiens mRNA or cDNA 

DKFZp564Dl 13 (from clone DKFZp564DM3). CPS 152 corresponds to a hypothetic gene 
UNKAL049250 which represents gene or genes that produce the RNA transcripts capable 
of hybridizing under stringent conditions to CPS 152. CPS 152 aligns to various 
chromosomal regions with 97-98% sequence identity. One region includes LOCI 96 123 
which is located in an intron sequence of LOC143518. LOC143518 is located on 
chromosome 11. Another region is located at chromosome 16pl2.1 and includes or 
overlaps LOCI 463 84, LOG 197204, and LOCI 46 136. LOCI 46 136 encodes a protein 
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similar to nuclear pore complex interacting protein. A third region is also located at 
chromosome 16pl2.1, and overlaps LOC220548 which encodes hypothetical protein 
KIAA0220. A fourth region is next to KIAA0220 which encodes KIAA0220 protein and is 
located at chromosome 16pl2.1. A fifth region is at 16pl2.2, and next to LOC146172. A 
sixth region is on chromosome 7 and includes or overlaps LOC202736, LOCI 54729, and 
LOCI 54725. LOCI 54729 encodes a protein similar to nuclear pore complex interacting 
protein. LOCI 54725 encodes a protein similar to hypothetical protein KIAA0220. A 
seventh region is near LOCI 463 85 which is located at chromosome 16ql3. An eighth 
region includes LOCI 97445 which is also located at chromosome 16ql3 and encodes a 
protein similar to BTG3 associated nuclear protein, isoform a (BANP homolog or SMAR1 
homolog). A ninth region is at 16q22.3 and includes LOCI 46452 which encodes a protein 
similar to KIAA0251 hypothetical protein. A tenth region is at 16pl3.2, and aligns with 
putative gene LOC146613. An eleventh region is located 5' to the polypeptide-coding 
sequence of NPIP. NPIP encodes a nuclear pore complex interacting protein, and has 
LocusID: 9284. NPIP is located at chromosome 16pl3-pll. Yet another region is located 
near LOCI 241 55. LOCI 24 155 encodes a protein similar to nuclear pore complex 
interacting protein, and is located at chromosome 16pll.2. Other regions include 
LOC197366 at 16pll.2, KIAA0370 at 16pl2.1-pl 1.2, LOC146130 at 16pll.l, and 
LOC197362at 16pll.2. 

[0278] In addition, CPS 152 has about 97% sequence identity with BANP. BANP 

encodes BTG3 associated nuclear protein, and has LocusID: 54971. The gene is located at 
chromosome 18. BTG3 is a protein that interacts with CAF1 which is a component of the 
general transcription multisubunit complex. It is thought that BTG3 is involved in negative 
control of the cell cycle. The protein encoded by BANP can bind to BTG3. Studies with 
mouse homolog suggest that this encoded protein may also interact with a specific nuclear 
matrix/scaffold-associated region (MAR). Transcript variants encoding different isoforms 
have been described for BANP gene. 

[0279] CPS 152 also aligns with LOCI 18735 with about 92% sequence identity. 

LOCI 18735 encodes a protein similar to apoptosis response protein or prostate apoptosis 
response protein 4. This gene is located on chromosome 10 with reported cytogenetic 
location 10q24.2. 

[0280] Furthermore, fragments of AL049250 (SEQ ID NO: 144) align with other 

chromosomal regions with about 78-85% sequence identity. For instance, nucleotides 182 
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to 2011 of AL049250 align with a genomic sequence near LOC139011. LOC139011 
encodes a protein similar to Arabidopsis thaliana DNA-directed RNA polymerase (EC 
2.7.7.6) II largest chain (JDMU1). LOC139011 is located at chromosome llpl5.5. 
Nucleotides 1720 to 2185 of SEQ ID NO: 144 (AL049250) align with LOC220178 which 
has sequence similarity to rat kidney-specific (KS) gene and is located at chromosome 
10q23.2. Nucleotides 1463 to 1911 of SEQ ID NO: 144 align with CECR7 which encodes 
cat eye syndrome chromosome region, candidate 7. CECR7 has LocusID: 27438, and is 
located on chromosome 22. Moreover, nucleotides 1483 to 1943 of SEQ ID NO: 144 align 
with LOC204354 which encodes a protein similar to SA rat hypertension-associated 
homolog and is located on chromosome 15. Nucleotides 1483 to 1943 of SEQ ID NO: 144 
align with BUCS1 which encodes butyryl Coenzyme A synthetase 1. BUCS1 has LocusID: 
1 16285, and is located on chromosome 16 with reported cytogenetic location 16pl2.2. 
[0281] CPS 153 corresponds to GR02 which encodes GR02 oncogene. The gene 

has LocusID: 2920, and is located on chromosome 4 with reported cytogenetic location 
4q21 . The gene product may be a chemotactic agent for polymorphonuclear leukocytes. 
[0282] CPS 153 also aligns with GROl with about 85% sequence identity. GROl 

represents GROl oncogene (melanoma growth stimulating activity, alpha). The gene has 
LocusID: 2919, and is located on chromosome 4. The gene product has melanoma growth 
stimulating activity, and may be a mitogenic factor involved in inflammatory processes. 
[0283] In addition, nucleotides 2 to 298 of M36820 (SEQ ID NO: 145) have about 

89-94% sequence identity with GR03. GR03 represents GR03 oncogene, and has 
LocusID: 2921. The gene is located at chromosome 4q21. GR03 gene product may be a 
mitogenic factor. Nucleotides 184-299 of SEQ ID NO: 145 (M36820) have 91% sequence 
identity with LOC201963. LOC201963 encodes a protein similar to heterogeneous nuclear 
ribonucleoprotein Al (helix-destabilizing protein) (single-strand binding protein) (hnRNP 
core protein Al) (HDP). LOC201963 is located at chromosome 4ql3.3. 
[0284] CPS 154 corresponds to INPP4A which encodes inositol polyphosphates- 

phosphatase, type I, 107kD. The gene has LocusID: 3631, and is located on chromosome 2 
with reported cytogenetic location 2qll.2. INPP4A gene product involves in 
phosphatidylinositol signaling pathways. This product removes the phosphate group at 
position 4 of the inositol ring from inositol 3,4-bisphosphate. 
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[0285] CPS 1 55 corresponds to GPT which encodes glutamic-pyruvate transaminase 

(alanine aminotransferase). The gene has LocusID: 2875, and is located on chromosome 8 
with reported cytogenetic location 8q24.3. 

[0286] Nucleotides 9 to 1550 of SEQ ID NO: 147 (U70732) align with a 

chromosomal region with 96% sequence identity. The chromosomal region is located 3' to 
FBXL6. FBXL6 encodes F-box and leucine-rich repeat protein 6, and has LocusID: 26233. 
FBXL6 is located at chromosome 8q24.3. FBXL6 encodes a member of the F-box protein 
family which is characterized by an approximately 40 amino acid motif, the F-box. 
Nucleotides 1962 to 2110 of SEQ ID NO: 147 have 83% sequence identity with GPT2 
which encodes glutamic pyruvate transaminase (alanine aminotransferase) 2. GPT2 has 
LocusID: 84706, and is located on chromosome 16. 

[0287] CPS 156 corresponds to MYL4 which encodes myosin, light polypeptide 4, 

alkali; atrial, embryonic. The gene has LocusID: 4635, and is located on chromosome 17 
with reported cytogenetic location 17q21-qter. Myosin is a hexameric ATPase cellular 
motor protein. It is composed of two myosin heavy chains, two nonphosphorylatable 
myosin alkali light chains, and two phosphorylatable myosin regulatory light chains. 
MYL4 encodes a myosin alkali light chain that is found in embryonic muscle and adult 
atria. MYL4 gene product may modulate the interaction between myosin and actin. It is a 
member of a family of mysosin and actin regulatory proteins 

[0288] CPS 157 corresponds to NFE2 which encodes nuclear factor (erythroid- 

derived 2), 45kD. The gene has LocusID: 4778, and is located on chromosome 12 with 
reported cytogenetic location 12ql3. NFE2 gene product is a 45 kD subunit of the bZIP 
dimeric transcription factor. The transcription factor may regulate expression of the beta 
globin gene (HBB). CPS 157, as well as NFE2, are located within an intron of ATF7. 
ATF7 encodes activating transcription factor 7, and has LocusID: 11016. ATF7 is located 
at chromosome 12ql3. The gene product is a leucine zipper DNA-binding protein, and may 
recognize a cAMP response element (CRE). The gene product may also be involved in the 
regulation of adenovirus Ela-responsive and cellular cAMP-inducible promoters. 
[0289] CPS 158 corresponds to POLR2J which encodes polymerase (RNA) II 

(DNA directed) polypeptide J (13.3kD). The gene has LocusID: 5439, and is located on 
chromosome 7 with reported cytogenetic location 7qll.2. This gene encodes a subunit of 
RNA polymerase II, the polymerase responsible for synthesizing messenger RNA in 
eukaryotes. The product of this gene exists as a heterodimer with another polymerase 
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subunit, and the heterodimer forms a core subassembly unit of the polymerase. Two similar 
genes are located nearby at chromosome 7qll.2 and another similar locus is found at 
chromosome 7p 1 5 . 

[0290] Nucleotides 11 to 382 of SEQ ID NO: 150 (L37127) have 94% sequence 

identity with LOC245815. LOC245815, also known as POLR2J2, is a DNA directed RNA 
polymerase II polypeptide J-related gene. LOC245815 has LocusID: 246721, and is located 
at chromosome 7qll.22. Similarity to a related locus suggests that LOC245815 encodes a 
subunit of RNA polymerase II. Alternative splicing of this gene results in at least three 
transcript variants encoding different isoforms. 

[0291] In addition, nucleotides 11 to 382 of L37127 have 94% sequence identity 

with a chromosomal region near LOCI 54696 and a chromosomal region on chromosome 7. 
LOCI 54696 encodes a protein similar to HSPC047 protein, and is located at chromosome 
7pl5.1. 

[0292] CPS 159 corresponds to CARM1 which encodes coactivator-associated 

arginine methyltransferase-1. The gene has LocusID: 10498, and is located on chromosome 
1 9 with reported cytogenetic location 1 9pl 3 .2. 

[0293] CPS 160 corresponds to UNK_AF038171 which is located in an intron 

sequence of LOC206073. LOC206073 is located on chromosome 4 with reported 
cytogenetic location 4q24. 

[0294] CPS 161 corresponds to RAB2 which encodes RAB2, member RAS 

oncogene family. The gene has LocusID: 5862, and is located on chromosome 8 with 
reported cytogenetic location 8ql 1.23. RAB2 gene product is also known as GTP-binding 
protein 2, and may be involved in vesicle transport from the ER to the Golgi complex. The 
gene product is a member of the RAB-subfamily. 

[0295] Affymetrix annotation suggests that CPS 162 corresponds to 6H9A. Blast 

search against the Entrez human genome database shows that CPS 162 aligns with an intron 
sequence of MYOIE with about 94% sequence identity. MYOIE encodes myosin IE, and 
has LocusID: 4643. MYOIE is located on chromosome 15 with reported cytogenetic 
location 15q21-q22. MYOOIE gene product is similar to class I myosin, and may bind to 
proline-rich peptides. The gene product contains an Src homology 3 (SH3) and a myosin 
head domain (motor domain). 

[0296] CPS 163 corresponds to EPB42 which encodes erythrocyte membrane 

protein band 4.2. The gene has LocusID: 2038, and is located on chromosome 15 with 
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reported cytogenetic location 15ql5-q21. Erythrocyte membrane protein band 4.2 is an 
ATP-binding protein which may regulate the association of protein 3 with ankyrin. It 
probably has a role in erythrocyte shape and mechanical property regulation. Mutations in 
the EPB42 gene are associated with recessive spherocytic elliptocytosis and recessively 
transmitted hereditary hemolytic anemia. 

[0297] CPS 163 also aligns with LOC203401 with about 97% sequence identity. 

LOC203401 encodes a protein similar to erythrocyte membrane protein band 4.2 (P4.2) 
(Pallidin). The chromosomal location of LOC203401 is unknown. 

[0298] CPS 164 corresponds to CGTHBA which denotes "conserved gene telomeric 

to alpha globin cluster." The gene has LocusID: 8131, and is located on chromosome 16 
with reported cytogenetic location 1 6p 1 3 . 3 . 

[0299] CPS 165 corresponds to DOC-1R which encodes tumor suppressor deleted in 

oral cancer-related 1. The gene has LocusID: 10263, and is located on chromosome 1 1 with 
reported cytogenetic location llql3. The gene product is similar to hamster doo-1. CPS 
165 also aligns with LOC222984 with about 89% sequence identity. LOC222984 encodes a 
protein similar to tumor suppressor deleted in oral cancer-related 1, and is located at 
chromosome 7p22.2. 

[0300] Nucleotides 3 to 663 of SEQ ID NO: 157 (AF089814) have about 86% 

sequence identity with LOCI 69609 and LOCI 69607. Both genes encode a protein similar 
to Myosin Vb (Myosin 5B). LOC169609 is located at chromosome 9ql2. LOC169607 is 
located at chromosome 9q21.1 1. In addition, nucleotides 3 to 777 of AF089814 have about 
86-93% sequence identity with LOC138403. LOC138403 encodes a protein similar to 
Myosin Vb (Myosin 5B), and is located at chromosome 9ql 3. 

[0301] CPS 166 corresponds to KIAA0353 (DMN ) which encodes desmuslin. The 

gene has LocusID: 23336. DMN is located on chromosome 15 with reported cytogenetic 
location 15q26.3. 

[0302] A fragment of CPS 166 (nucleotides 477 to 602 of AI077476) aligns with 

LOCI 205 11 with about 97% sequence identity. LOCI 205 11 encodes a protein similar to 
rig-1 protein (mouse), and is located at chromosome 1 lq23.3. 

[0303] Affymetrix annotation suggests that CPS 167 corresponds to CSH1. Blast 

search against the Entrez human genome database shows that CPS 167 also aligns with 
CSH2 with about 98% sequence identity. CSH2 encodes chorionic somatomammotropin 
hormone 2. The gene has LocusID: 1443, and is located on chromosome 17 with reported 
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cytogenetic location 17q24.2. The protein encoded by this gene is a member of the 
somatotropin/prolactin family of hormones and may play an important role in growth 
control. CSH2 is located at the growth hormone locus on chromosome 17 along with four 
other related genes in the same transcriptional orientation. This arrangement is thought to 
have evolved by a series of gene duplications. Although the five genes share a high degree 
of sequence identity, they are reported to be expressed in different tissues. Alternative 
splicing generates additional isoforms of each of the five growth hormones. CSH2 is 
expressed in the placenta and utilizes multiple transcription initiation sites. Expression of 
the mature proteins for chorionic somatomammotropin hormones 1 and 2 is upregulated 
during development. 

[0304] CPS 168 corresponds to LOC51048 (DKK3) which encodes dickkopf 

homolog 3 (Xenopus laevis) (RIG-like 5-6). The gene has LocusID: 27122, and is located 
on chromosome 11 with reported cytogenetic location llpl5.2. DKK3 gene product is also 
known as RIG-like 7-1, and may be related to proteins that antagonize Wnt signaling. 
[0305] Nucleotides 3 to 92 of SEQ ID NO: 160 (AF034209) have about 90% 

sequence identity with RIG (regulated in glioma). RIG has LocusID: 10530, and is located 
at chromosome 1 Ip 1 5. 1 . 

[0306] CPS 169 corresponds to SELP which encodes selectin P (granule membrane 

protein 140kD, antigen CD62). The gene has LocusID: 6403, and is located on 
chromosome 1 with reported cytogenetic location Iq22-q25. SELP gene product is a 
platelet alpha-granule membrane protein of molecular weight 140,000 that redistributes to 
the plasma membrane during platelet activation and degranulation. It is a member of a 
family of adhesion/homing receptors. Alternative splice variants may occur but are not well 
documented. The gene product may mediate interactions of leukocytes with the blood 
vessel wall. It contains an EGF domain and complement regulatory (CR) protein domains. 
[0307] CPS 170 corresponds to RAP1GA1 which encodes GTPase activating 

protein 1 for RAP1. The gene has LocusID: 5909, and is located on chromosome 1 with 
reported cytogenetic location Ip36.1-p35. Nucleotides 916 to 1044 of SEQ ID NO: 162 
(M64788) have about 85% identity with KIAA1039. KIAA1039 encodes KIAA1039 
protein, and has LocusID: 23108. The gene has reported cytogenetic location 17pl3.3. 
[0308] CPS 171 corresponds to THBS1 which encodes thrombospondin 1. The 

gene has LocusID: 7057, and is located on chromosome 15 with reported cytogenetic 
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location 15ql5. Thrombospondin-1 may have a role in blood clotting and in angiogenesis. 
It is a member of a family of adhesive molecules. 

[0309] CPS 172 corresponds to CHRNA4 which encodes cholinergic receptor, 

nicotinic, alpha polypeptide 4. The gene has LocusID: 1 137, and is located on chromosome 
20 with reported cytogenetic location 20ql3.2-ql3.3. Nucleotides 615 to 1995 of SEQ ID 
NO: 164 (U62433) also align with LOC149656. LOC149656 encodes a protein similar to 
neuronal acetylcholine receptor protein, alpha-4 chain precursor, and is located at 
chromosome 20ql3.33. 

[0310] Fragments of nucleotides 602 to 1313 of U62433 (SEQ ID NO: 164) align 

with CHRNA2, CHRNA3 and CHRNB2 with about 79-89% sequence identity. CHRNA2 
encodes cholinergic receptor, nicotinic, alpha polypeptide 2 (neuronal). CHRNA2 has 
LocusID: 1135, and is located at chromosome 8p21. CHRNA3 encodes cholinergic 
receptor, nicotinic, alpha polypeptide 3. CHRNA3 has LocusID: 1136, and is located at 
chromosome 1 5q24. CHRNB2 encodes cholinergic receptor, nicotinic, beta polypeptide 2 
(neuronal). CHRNB2 has LocusID: 1 141, and is located at chromosome lq21.3. 
[0311] CPS 173 corresponds to S100A12 which encodes SI 00 calcium binding 

protein A12 (calgranulin C). The gene has LocusID: 6283, and is located on chromosome 1 
with reported cytogenetic location lq21. The protein encoded by this gene is a member of 
the SI 00 family of proteins containing 2 EF-hand calcium-binding motifs. SI 00 proteins 
are localized in the cytoplasm and/or nucleus of a wide range of cells, and involved in the 
regulation of a number of cellular processes such as cell cycle progression and 
differentiation. SI 00 genes include at least 13 members which are located as a cluster on 
chromosome lq21. S100A12 gene product is proposed to be involved in specific calcium- 
dependent signal transduction pathways, and its regulatory effect on cytoskeletal 
components may modulate various neutrophil activities. 

[0312] CPS 174 corresponds to CD9 which encodes CD9 antigen (p24). The gene 

has LocusID: 928, and is located on chromosome 12 with reported cytogenetic location 
12pl3.3. The protein encoded by this gene is a member of the transmembrane 4 
superfamily, also known as the tetraspanin family. Most of these members are cell-surface 
proteins that are characterized by the presence of four hydrophobic domains. These 
proteins mediate signal transduction events that play a role in the regulation of cell 
development, activation, growth and motility. CD9-encoded protein is a cell surface 
glycoprotein that is known to complex with integrins and other transmembrane 4 
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superfamily proteins. It can modulate cell adhesion and migration and also trigger platelet 
activation and aggregation. In addition, the encoded protein appears to promote muscle cell 
fusion and support myotube maintenance. 

[0313] CPS 175 corresponds to PRDX2 (TDPX1) which encodes peroxiredoxin 2. 

Peroxiredoxin 2 is also known as thioredoxin-dependent peroxide reductase (thiol-specific 
antioxidant 1, natural killer-enhancing factor B), and may be protective against oxidative 
stress. PRDX2 gene has LocusID: 7001, and is located on chromosome 19 with reported 
cytogenetic location 19pl3.2. 

[0314] CPS 175 has about 88% sequence identity with MGC2599 and LOCI 34602. 

MGC2599 encodes hypothetical protein MGC2599 which is similar tokatanin p60 subunit 
A 1 2599. The gene has LocusID: 84056, and is located at chromosome 13ql2.2. 
LOCI 34602 encodes a protein similar to thiol-specific antioxidant (TSA), and is located at 
chromosome 6q21. 

[0315] Nucleotides 497 to 767 of SEQ ID NO: 167 (L19185) align with 

LOC2 19772 with 89% sequence identity. LOC2 19772 encodes peroxiredoxin 2 
(thioredoxin peroxidase 1) (thioredoxin-dependent peroxide reductase 1) (thiol-specific 
antioxidant protein) (TSA) (PRP) (Natural killer cell enhancing factor B) (NKEF-B). 
LOC219772 is located at chromosome 10qll.21. Moreover, nucleotides 5 to 65 of L19185 
show 100% sequence identity with LOC204141 and LOC205227. LOC204141 is similar to 
H-NUC (human), and is located on chromosome 13. LOC205227 encodes a protein similar 
to malonyl-CoA decarboxylase (EC 4.1.1.9) (goose), and is located on chromosome 2. 
[0316] CPS 176 corresponds to B7 which encodes B7 protein. The gene has 

LocusID: 10233, and is located on chromosome 12 with reported cytogenetic location 
12pl3. B7 protein has a low sequence similarity to the regulatory subunit of protein 
phosphatases. B7 protein contains leucine rich repeats, and may mediate protein-protein 
interactions. 

[0317] CPS 177 corresponds to BPGM which encodes 2,3-bisphosphoglycerate 

mutase. The gene has LocusID: 669, and is located on chromosome 7 with reported 
cytogenetic location 7q31-q34. 2,3-bisphosphoglycerate mutase has synthase, mutase, and 
phosphatase activities. It is involved in controlling 2,3-diphosphoglycerate metabolism. 
[0318] CPS 178 corresponds to PSMA7 which encodes proteasome (prosome, 

macropain) subunit, alpha type, 7. The gene has LocusID: 5688, and is located on 
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chromosome 20 with reported cytogenetic location 20q 13.33. Alpha subunit 7 of the 
proteasome (prosome macropain) is a possible target for hepatitis B virus X protein. 
[0319] CPS 179 corresponds to GMPR which encodes guanosine monophosphate 

reductase. The gene has LocusID: 2766, and is located on chromosome 6 with reported 
cytogenetic location 6p23. Guanosine monophosphate reductase may facilitate 
thermogenesis, and has very strong similarity to rat guanosine monophosphate reductase. 
[0320] CPS 180 corresponds to TMOD which encodes tropomodulin. The gene has 

LocusID: 7111, and is located on chromosome 9 with reported cytogenetic location 9q22.3. 
Tropomodulin can bind to an end of erythrocyte tropomyosin. 

[0321] CPS 181 corresponds to C4A which encodes complement component 4A. 

The gene has LocusID: 720. The gene is located on chromosome 6. This gene encodes the 
acidic form of complement factor 4, part of the classical activation pathway. The gene 
product is expressed as a single chain precursor which is proteolytically cleaved into a 
trimer of alpha, beta, and gamma chains prior to secretion. The trimer provides a surface 
for interaction between the antigen-antibody complex and other complement components. 
The alpha chain may be cleaved to release C4 anaphylatoxin, a mediator of local 
inflammation. Deficiency of complement component 4A is associated with systemic lupus 
erythematosus and type I diabetes mellitus. C4A gene localizes to the major 
histocompatibility complex (MHC) class III region on chromosome 6. Varying haplotypes 
of this gene cluster exist, such that individuals may have 1 , 2, or 3 copies of this gene. 
[0322] Fragments of CPS 181 (nucleotides 1 to 45 and nucleotides 199 to 248 of 

SEQ ID NO: 173) also align with LOC220819 with 100% sequence identity. LOC220819 
encodes a protein similar to dJ34F7.4 (complement component 4A). LOC220819 is located 
on chromosome 6. 

[0323] In addition, CPS 181 aligns with C4B with over 94% sequence identity. 

C4B encodes complement component 4B, and has LocusID: 721. C4B is located at 
chromosome 6p21.3. C4B gene encodes the basic form of complement factor 4, part of the 
classical activation pathway. This gene exists as a long form and a short form due to the 
presence or absence of a 6.4 kb endogenous HERV-K retrovirus in intron 9. 
[0324] CPS 182 corresponds to GPR12 which encodes G protein-coupled receptor 

12. The gene has LocusID: 2835, and is located on chromosome 13 with reported 
cytogenetic location 13ql2. The gene product is a member of the G protein-coupled 
receptor family. It is similar to murine Gpcrl2 and rat Rn.10218. 
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[0325] CPS 182 also aligns with a sequence near LOC202175 with 97% sequence 

identity. LOC202175 is located at chromosome 5pl5.33. 

[0326] CPS 183 corresponds to ADFP which encodes adipose differentiation-related 

protein. The gene has LocusID: 123, and is located on chromosome 9 with reported 
cytogenetic location 9p21.2. Adipocyte differentiation-related protein is associated with the 
globule surface membrane material. This protein is a major constituent of the globule 
surface. Increase in mRNA levels is one of the earliest indications of adipocyte 
differentiation. The protein is a component of milk lipid globules. The protein is also 
known as adipophilin. 

[0327] Nucleotides 1 to 1314 of SEQ ID NO: 175 (X97324) have 91-92% sequence 

identity with ILF2 which encodes interleukin enhancer binding factor 2, 45kD. ILF2 has 
LocusID: 3608, and is located at chromosome lq21.1. The gene product is a subunit of 
nuclear factor of activated T-cells (NF-AT). It is a DNA-binding transcription factor. 
[0328] CPS 184 corresponds to MYL5 which encodes myosin, light polypeptide 5, 

regulatory. The gene has LocusID: 4636, and is located on chromosome 4 with reported 
cytogenetic location 4pl6.3. This gene encodes one of the myosin light chains, a 
component of the hexameric ATPase cellular motor protein myosin. Myosin is composed 
of two heavy chains, two nonphosphorylatable alkali light chains, and two phbsphorylatable 
regulatory light chains. This gene product, one of the regulatory light chains, is expressed in 
fetal muscle and in adult retina, cerebellum, and basal ganglia. The gene product may 
modulate the interaction between myosin and actin. It is a member of a family of mysosin 
and actin regulatory proteins. 

[0329] CPS 185 corresponds to DPM2 which encodes dolichyl-phosphate 

mannosyltransferase polypeptide 2, regulatory subunit. The gene has LocusID: 8818, and is 
located on chromosome 9 with reported cytogenetic location 9q34.13. 

[0330] CPS 186 corresponds to MCC which encodes a protein mutated in colorectal 

cancers. The gene has LocusID: 4163, and is located on chromosome 5 with reported 
cytogenetic location 5q21-q22. MCC is a candidate for the putative colorectal tumor 
suppressor gene. The MCC gene product may be involved in early stages of colorectal 
neoplasia in both sporadic and familial tumors. The gene product is similar to the G 
protein-coupled m3 muscarinic acetylcholine receptor. 

[0331] CPS 187 corresponds to F3 which encodes coagulation factor III 

(thromboplastin, tissue factor). The gene has LocusID: 2152, and is located on 
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chromosome 1 with reported cytogenetic location Ip22-p21. This gene encodes coagulation 
factor III which is a cell surface glycoprotein. This factor enables cells to initiate the blood 
coagulation cascades, and it functions as the high-affinity receptor for the coagulation factor 
VII. The resulting complex provides a catalytic event that is responsible for initiation of the 
coagulation protease cascades by specific limited proteolysis. Unlike some of other 
cofactors of these protease cascades, which circulate as nonfunctional precursors, 
coagulation factor III is a potent initiator that is fully functional when expressed on cell 
surfaces. There are 3 distinct domains of this factor: extracellular, transmembrane, and 
cytoplasmic. Coagulation factor III can initiate the coagulation protease cascade assembly 
and propagation, and may function in normal hemostasis. The factor is a component of the 
cellular immune response. 

[0332] CPS 188 corresponds to KLF1 which encodes Kruppel-like factor 1 

(erythroid). The gene has LocusID: 10661, and is located on chromosome 19 with reported 
cytogenetic location 19pl3.13-pl3.12. Erythroid Kruppel-like factor 1 is a transcriptional 
activator of the adult beta-globin promoter. 

[0333] CPS 188 also aligns to LOC146544 with about 94% sequence identity. 

LOCI 46544 is located on chromosome 16. 

[0334] CPS 189 corresponds to HBG2. HBG2 encodes hemoglobin, gamma G. 

The gene has LocusID: 3047, and is located on chromosome 11 with reported cytogenetic 
location llpl5.5. HBG1 is also located in the same chromosomal region. The gamma 
globin genes (HBG1 and HBG2) are normally expressed in the fetal liver, spleen and bone 
marrow. Two gamma chains together with two alpha chains constitute fetal hemoglobin 
(HbF) which is normally replaced by adult hemoglobin (HbA) at birth. In some betar 
thalassemias and related conditions, gamma chain production continues into adulthood. The 
two types of gamma chains differ at residue 136 where glycine is found in the G-garnma 
product (HBG2) and alanine is found in the A-gamma product (HBG1). The former is 
predominant at birth. The order of the genes in the beta-globin cluster is: 5'-epsilon — 
gamma-G — gamma-A — delta — beta— 3' . The gene product(s) can transport oxygen and 
carbon dioxide between lung and tissues. 

[0335] A fragment of CPS 189 (nucleotides 332..234 of SEQ ID NO: 181) has 86% 

sequence identity with HBE1 which encodes hemoglobin, epsilon 1 . 

[0336] In addition, SEQ ID NO: 277 (M91036) can be used to design probes for 

detecting HBG2. Nucleotides 2162-2268, 2391-2614 and 3501-3565 of SEQ ID NO: 277 
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align to HBG2 with 100% sequence identity. Nucleotides 2379 to 2626 and 7309 to 7556 
of SEQ ID NO: 277 have 87% sequence identity with HBE1 which encodes hemoglobin, 
epsilon L HBE1 gene has LocusID: 3046, and is located at chromosome llpl5.5. 
Nucleotides 2384 to 2621 and 7314 to 7551 of SEQ ID NO: 277 also have 84% sequence 
identity with a chromosomal region on chromosome 1 1 . 

[0337] CPS 190 corresponds to GR03 which encodes GR03 oncogene. The gene 

has LocusID: 2921, and is located on chromosome 4 with reported cytogenetic location 
4q21 . The gene product may be a mitogenic factor. 

[0338] Nucleotides 6 to 298 of SEQ ID NO: 182 (M36821) have about 86-95% 

sequence identity with GROl and GR02. GROl encodes GROl oncogene (melanoma 
growth stimulating activity, alpha), and has LocusID: 2919. GROl is located at 
chromosome 4q21. GROl gene product has melanoma growth stimulating activity, and 
may be a mitogenic factor involved in inflammatory processes. GR02 encodes GR02 
oncogene, and has LocusID: 2920. GR02 is located at chromosome 4q21. GR02 gene 
product may be a chemotactic agent for polymorphonuclear leukocytes. 
[0339] Afifymetrix annotation suggests that CPS 191 corresponds to PLEC1. Blast 

search against the Entrez human genome database shows that nucleotides 14629 to 14800 of 
SEQ ID NO: 183 (U53204) have 93% sequence identity with LOCI 6261 3 and a 
chromosomal region near LOC93232. Both LOCI 626 13 and LOC93232 are located at 
chromosome 17q25.3, and encode proteins similar to KIAA1640 protein. In addition, 
nucleotides 14268 to 14800 of SEQ ID NO: 183 (U53204) align with LOC160535 with 
88% sequence identity. LOC160535 is located at chromosome 12ql2. 

[0340] CPS 192 corresponds to SLC16A3 which encodes solute carrier family 16 

(monocarboxylic acid transporters), member 3. The gene has LocusID: 9123, and is located 
on chromosome 17. The gene product is a member of monocarboxylate transporter family, 
and may function as a transporter. Nucleotides 34 to 945 of SEQ ID NO: 184 (U81800) 
align with LOC201281 with over 96% sequence identity. LOC201281 encodes a protein 
similar to monocarboxylate transporter, and is located at chromosome 1 7q25.3. 
[0341] CPS 194 corresponds to FKBP8 which encodes FK506 binding protein 8 

(38kD). The gene has LocusID: 23770, and is located on chromosome 19 with reported 
cytogenetic location 19pl2. The protein encoded by this gene is a member of the 
immunophilin protein family, which play a role in immunoregulation and basic cellular 
processes involving protein folding and trafficking. The encoded protein does not seem to 
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have PPIase/rotamase activity. It has a three-unit tetratricopeptide repeat and a consensus 
leucine-zipper repeat, and may have a role in neurons associated with memory function. 
[0342] CPS 194 also aligns with an intron sequence of PPP1R12B with about 88% 

sequence identity. PPP1R12B encodes protein phosphatase 1, regulatory (inhibitor) subunit 
12B. The gene has LocusID: 4660, and is located on chromosome 1 with reported 
cytogenetic location lq32.1. Myosin light chain phosphatase (MLCP) consists of three 
subunits: the catalytic subunit, the large subunit/myosii binding subunit (MBS) and the 
small subunit (sm-M20). PPP1R12B is a multi-functional gene which encodes both MBS 
and sm-M20. MLCP regulates myosins and the dephosphorylation is enhanced by the 
presence of MBS. The sm-M20 subunit is suggested to play a regulatory role in muscle 
contraction by binding to MBS. MBS is also encoded by another gene, myosin light chain 
phosphatase target subunit 1. Although both MBSs increase the activity of MLCP, myosin 
light chain phosphatase target subunit 1-MBS is a more efficient activator. There are at least 
four alternatively spliced transcript variants of PPP1R12B described, two altering the MBS 
coding region and two altering the sm~M20 coding region. 

[0343] CPS 195 corresponds to RNASE2 which encodes ribonuclease, RNase A 

family, 2 (liver, eosinophil-derived neurotoxin). The gene has LocusID: 6036, and is 
located on chromosome 14 with reported cytogenetic location 14q24-q31. Eosinophil- 
derived neurotoxin has neurotoxic and ribonuclease activities. It is a member of the 
ribonuclease superfamily. 

[0344] CPS 195 also aligns with LOC122661 with about 92% sequence identity. 

LOCI 22661 encodes a protein similar to nonsecretory ribonuclease precursor (ribonuclease 
US) (eosinophil-derived neurotoxin) (RNase UpI-2) (ribonuclease 2) (RNase 2). 
LOC122661 is located at chromosome 14qll.l. In addition, CPS 195 has about 88-94% 
sequence identity with RNASE3. RNASE3 encodes ribonuclease, RNase A family, 3 
(eosinophil cationic protein). RNASE3 has LocusID: 6037, and is located at chromosome 
14q24-q31. RNASE3 gene product has neurotoxic and ribonuclease activities. It is a 
member of the ribonuclease superfamily. 

[0345] Nucleotides 639 to 735 of SEQ ID NO: 186 (X55988) show 95% sequence 

identity with an intron sequence of LOC159655. LOC159655 is located at chromosome 
10q23.33. 

[0346] CPS 196 corresponds to BCAT1 which encodes branched chain 

aminotransferase 1, cytosolic. The gene has LocusID: 586, and is located on chromosome 
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12 with reported cytogenetic location 12pter-ql2. The lack of the cytosolic enzyme 
branched-chain amino acid transaminase (BCT) causes cell growth inhibition. There may 
be 2 different clinical disorders due to a defect of branched-chain amino acid 
transamination: hypervalinemia and hyperleucine-isoleucinemia. Cytosolic branched-chain 
amino acid aminotransferase 1 catalyzes conversion of branched-chain a-keto acids to L- 
amino acids. 

[0347] CPS 199 corresponds to SPP1 which encodes secreted phosphoprotein 1 

(osteopontin, bone sialoprotein I, early T-lymphocyte activation 1). The gene has LocusID: 
6696, and is located on chromosome 4 with reported cytogenetic location 4q21-q25. 
Osteopontin (bone sialoprotein) is a bone and blood vessel extracellular matrix protein 
involved in calcification and atherosclerosis. 

[0348] CPS 201 corresponds to GROl which encodes GROl oncogene (melanoma 

growth stimulating activity, alpha). The gene has LocusID: 2919, and is located on 
chromosome 4 with reported cytogenetic location 4q21. The gene product has melanoma 
growth stimulating activity, and may be a mitogenic factor involved in inflammatory 
processes. 

[0349] CPS 201 also aligns with GR02, which encodes GR02 oncogene, with 87- 

89% sequence identity. GR02 has LocusID: 2920, and is located at chromosome 4q21. 
GR02 may be a chemotactic agent for polymorphonuclear leukocytes. 

[0350] Nucleotides 1 to 830 of SEQ ID NO: 189 (X54489) have about 90% 

sequence identity with GR03 which encodes GR03 oncogene. GR03 has LocusID: 2921, 
and is located at chromosome 4q21. GR03 gene product may be a mitogenic factor. 
Nucleotides 2 to 466 of SEQ ID NO: 189 have 85% sequence identity with LOC201963 
which encodes a protein similar to heterogeneous nuclear ribonucleoprotein Al (helix- 
destabilizing protein) (single-strand binding protein) (hnRNP core protein Al) (HDP). 
LOC201963 is located at chromosome 4ql3.3. 

[0351] CPS 202 corresponds to FLJ21588 (DKFZP586O0223) which encodes ASO 

1 complex subunit PI 00. The gene has LocusID: 84164, and is located on chromosome 22 
with reported cytogenetic location 22ql2.1 . 

[0352] CPS 205 corresponds to FASN which encodes fatty acid synthase. The gene 

has LocusID: 2194, and is located on chromosome 17 with reported cytogenetic location 
17q25. The enzyme encoded by this gene is a multifunctional protein. One of its functions 
is to catalyze the synthesis of palmitate from acetyl-CoA and malonyl-CoA, in the presence 
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of NADPH, into long-chain saturated fatty acids. In some cancer cell lines, this protein has 
been found to be fused with estrogen receptor-alpha (ER-alpha), in which the N-terminus of 
FAS is fused in-frame with the C-terminus of ER-alpha. 

[0353] Nucleotides 7777 to 8199 and 8270 to 8457 of SEQ ID NO: 192 (U29344) 

have about 94-96% sequence identity with LOC133934. The gene is a hypothetical gene, 
and is located at chromosome 5pl5.2. Nucleotides 7528 to 8223 of SEQ ID NO: 192 show 
84% sequence identity with an intron sequence of LY9 which encodes lymphocyte antigen 
9. LY9 has LocusID: 4063, and is located at chromosome Iq21.3-q22. Lymphocyte 
antigen 9 may be involved in adhesion between T cells and accessory cells. It is a member 
of the immunoglobulin superfamily. In addition, nucleotides 8299 to 8337 of U29344 align 
with DDX27 with 97% sequence identity. DDX27 encodes DEAD/H (Asp-Glu- Ala- 
Asp/His) box polypeptide 27, and has LocusID: 55661. DDX27 is located at chromosome 
20ql3.13. DEAD box proteins, characterized by the conserved motif Asp-Glu- Ala- Asp 
(DEAD), are putative RNA helicases. They are implicated in a number of cellular 
processes involving alteration of RNA secondary structure such as translation initiation, 
nuclear and mitochondrial splicing, and ribosome and spliceosome assembly. Based on 
their distribution patterns, some members of this family are believed to be involved in 
embryogenesis, spermatogenesis, and cellular growth and division. DDX27 encodes a 
DEAD box protein which is a member of the DEAD/DEAH box ATP-dependent RNA or 
DNA helicase family. 

[0354] CPS 206 corresponds to HOXA1 which encodes homeo box Al. The gene 

has LocusID: 3198, and is located on chromosome 7 with reported cytogenetic location 
7pl5.3. Homeo box Al is a member of homeodomain family of DNA binding proteins, and 
may regulate gene expression, morphogenesis, and differentiation. 

[0355] CPS 207 corresponds to HMOX1 which encodes heme oxygenase 

(decycling) 1. The gene has LocusID: 3162, and is located on chromosome 22 with 
reported cytogenetic location 22ql3.1. CPS 207 aligns with nucleotides 15085942 to 
15086457 of chromosome 22 with 100% sequence identity. Heme oxygenase, an essential 
enzyme in heme catabolism, cleaves heme to form biliverdin, which is subsequently 
converted to bilirubin by biliverdin reductase, and carbon monoxide, a putative 
neurotransmitter. Heme oxygenase activity is induced by its substrate heme and by various 
nonheme substances. Heme oxygenase occurs as 2 isozymes, an inducible heme 
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oxygenase- 1 and a constitutive heme oxygenase-2. HMOX1 and HMOX2 belong to the 
heme oxygenase family. 

[0356] The chromosomal region to which CPS 207 aligns is in the proximity of 

other genes. These genes include MCM5 and LOC129121. MCM5 encodes MCM5 
minichromosome maintenance deficient 5, cell division cycle 46 (S. cerevisiae). It is 
LocusID: 4174, and located at chromosome 22ql3.1. The protein encoded by MCM5 is 
similar to S. cerevisiae CDC46 which is involved in the initiation of DNA synthesis. 
MCM5 gene product is a member of the MCM family of chromatin-binding proteins. 
LOC129121 is a hypothetical gene LOC129121 which is located at chromosome 22ql2.3. 
[0357J Nucleotides 26880 to 28079 of SEQ ID NO: 194 (Z82244) align with 

LOCI 68550 with 79% sequence identity. LOCI 68550 encodes a protein similar to pol 
protein. LOCI 68550 is located at chromosome 7q36.1. Nucleotides 26774 to 28057 of 
SEQ ID NO: 194 align with LOC205176 with 76% sequence identity. LOC205176 is 
located at chromosome 2pl2. 

[0358] Affymetrix annotation suggests that CPS 208 corresponds to BNIP3. Blast 

search against the Entrez human genome database shows that CPS 208 also aligns with 
LOCI 59348 with over 98% sequence identity. LOCI 59348 is located on chromosome TO 
with reported cytogenetic location 10q26.3. In addition, CPS 208 aligns with a 
chromosomal region on chromosome 14 with about 97% sequence identity. CPS 208 also 
has about 81% sequence identity with an intron sequence of LOCI 46062. LOCI 46062 
encodes a protein similar to FLJ00088 protein, and is located at chromosome 15q 14. 
[0359] Nucleotides 152 to 1081 of SEQ ID NO: 195 (AF002697) align with a 

chromosomal region near LOCI 52687 with 78% sequence identity. LOCI 52687 encodes a 
protein similar to Zinc finger protein 91 (zinc finger protein HTF10) (HPF7), and is located 
at chromosome 4pl6.3. 

[0360] CPS 209 corresponds to ZNF261 which encodes zinc finger protein 261. 

The gene has LocusID: 9203, and is located on chromosome X with reported cytogenetic 
location Xq 13.1. The gene product contains a putative zinc-binding motif (MYM). 
[0361] CPS 210 corresponds to MYH7 which encodes myosin, heavy polypeptide 7, 

cardiac muscle, beta. The gene has LocusID: 4625, and is located on chromosome 14 with 
reported cytogenetic location 14ql2. MYH7 encodes the cardiac muscle beta (or slow) 
isoform of myosin. Changes in the relative abundance of MYH7 gene product and MYH6 
gene product (the alpha, or fast, isoform of cardiac myosin heavy chain) correlate with the 
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contractile velocity of cardiac muscle. Mutations in MYH7 are associated with familial 
hypertrophic cardiomyopathy. MYH7 gene product is a member of the motor protein 
family that provide force for muscle contraction. 

[0362] Nucleotides 432 to 5869 of SEQ ID NO: 197 (M58018) align with MYH6 

with about 88-98% sequence identity. In particular, nucleotides 5741 to 5869 align with 
MYH6 with 96% sequence identity. MYH6 encodes myosin, heavy polypeptide 6, cardiac 
muscle, alpha (cardiomyopathy, hypertrophic 1). It has LocusID: 4624, and is located at 
chromosome 14ql2. Cardiac myosin heavy chain 6 alpha is a member of motor protein 
family that provide force for muscle contraction. 

[03631 Various fragments in nucleotides 432 to 5543 of M58018 have about 77- 

90% sequence identity with MYH1, MYH2, MYH3, MYH4 andMYH13. MYH1 encodes 
myosin, heavy polypeptide 1, skeletal muscle, adult, and has LocusID: 4619. MYH2 
encodes myosin, heavy polypeptide 2, skeletal muscle, adult, and has LocusID: 4620. 
MYH3 encodes myosin, heavy polypeptide 3, skeletal muscle, embryonic, and has 
LocusID: 4621. MYH4 encodes myosin, heavy polypeptide 4, skeletal muscle, and has 
LocusID: 4622. MYH13 encodes myosin, heavy polypeptide 13, skeletal muscle, and has 
LocusID: 8735. MYH1, MYH2, MYH3 and MYH4 are all reportedly located at 
chromosome 17pl3.1. MYH13 has reported cytogenetic location 17pl3. 
[0364] Myosin is a major contractile protein which converts chemical energy into 

mechanical energy through the hydrolysis of ATP. Myosin is a hexameric protein 
composed of a pair of myosin heavy chains (MYH) and two pairs of nonidentical light 
chains. Myosin heavy chains are encoded by a multigene family. In mammals at least 10 
different myosin heavy chain (MYH) isoforms have been described from striated, smooth, 
and nonmuscle cells. These isoforms show expression that is spatially and temporally 
regulated during development. The proteins encoded by MYH1, MYH4 and MYH 13 
contain ATPase head and rod-like tail domains. Myosin heavy chain 1 and 13 may provide 
force for muscle contraction, cytokinesis and phagocytosis. Skeletal muscle myosin heavy 
chain 3 and 4 may provide force for muscle contraction. 

[0365] In addition, nucleotides 1494 to 1654 of M58018 align with MYH7B and a 

chromosomal region near FLJ22037 with about 88-92% sequence identity. FLJ22037 
encodes hypothetical protein FLJ22037, and has LocusID: 84176. It is located on 
chromosome 7 with reported cytogenetic location 7q 11.21. MYH7B encodes myosin, 
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heavy polypeptide 7B, cardiac muscle, beta. MYH7B has LocusID: 57644, and is located at 
chromosome 20ql 1 .21 . 

[0366] CPS 211 corresponds to IL1B which encodes interleukin 1, beta. The gene 

has LocusID: 3553, and is located on chromosome 2 with reported cytogenetic location 
2ql4. Interleukin 1 beta may initiate and amplify the immune and inflammatory responses. 
[0367] CPS 212 corresponds to STX1A which encodes syntaxin 1A (brain). The 

gene has LocusID: 6804, and is located on chromosome 7 with reported cytogenetic 
location 7qll.23. Syntaxin 1A (brain) may be involved in intracellular transport and 
neurotransmitter release 

[0368] CPS 213 corresponds to ATPASEP (ATP9B) which encodes ATPase type 

IV, phospholipid transporting (P-type)(putative) (ATPase, Class II, type 9B). The gene has 
LocusID: 11071, and is located on chromosome 18 with reported cytogenetic location 
18q23. 

[0369] CPS 214 corresponds to CR1 which encodes complement component (3b/4b) 

receptor 1, including Knops blood group system. The gene has LocusID: 1378, and is 
located on chromosome 1 with reported cytogenetic location lq32. The gene comprises 
2769865 to 2857756 nucleotides of chromosome 1. This gene encodes a membrane 
glycoprotein found on peripheral blood cells, glomerular podocytes, and follicular dendritic 
cells. The protein encoded by the gene is a receptor for complement components C3b and 
C4b and regulates the activity of the complement cascade. Variation in the encoded protein 
is the basis of the Knops blood group system. The two common alleles, F and S, differ by 8 
exons and are thought to be the result of an unequal crossover event. A secreted form of the 
encoded protein present in plasma has been described, but its full length nature has not been 
determined. The encoded protein has short consensus repeats (SCRs). 

[0370] CPS 214 also aligns with CR1L with about 93% sequence identity. CR1L 

encodes complement component (3b/4b) receptor 1-like. It has LocusID: 1379, and is 
located at chromosome 1 q32. 1 . 

[0371] CPS 215 corresponds to DKFZP586M1523 which encodes 

DKFZP586M1523 protein. The gene has LocusID: 25941, and is located on chromosome 
18 with reported cytogenetic location 18ql2.1. 

[0372] CPS 215 also aligns with LOC201347 with over 99% sequence identity. 

LOC201347 is located in an intron of BRUNOL4 which encodes bruno-like 4, RNA 
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binding protein (Drosophila). BRUNOL4 has LocusID: 56853, and is located on 
chromosome 18 with reported cytogenetic location 18ql2. 

[0373] CPS 216 corresponds to KRT1 which encodes keratin 1 (epidermolytic 

hyperkeratosis). The gene has LocusID: 3848, and is located on chromosome 12 with 
reported cytogenetic location 12ql2-ql3. The protein encoded by this gene is a member of 
the keratin gene family. The type II cytokeratins include basic or neutral proteins which are 
arranged in pairs of heterotypic keratin chains coexpressed during differentiation of simple 
and stratified epithelial tissues. The type II cytokeratin encoded by KRT1 can be expressed 
in the spinous and granular layers of the epidermis with family member KRT10. Mutations 
in KRT1 and KRT10 genes may be associated with bullous congenital ichthyosiform 
erythroderma. The type II cytokeratins are clustered in a region of chromosome 12ql2-ql3. 
[0374] Nucleotides 4076 to 4275 of SEQ ID NO: 203 (M98776) have 87% sequence 

identity with KRT2A. KRT2A encodes keratin 2A (epidermal ichthyosis bullosa of 
Siemens). The gene has LocusID: 3849, and is located on chromosome 12 with reported 
cytogenetic location 12ql l-ql3. KRT2A gene is a member of the keratin gene family. The 
protein encoded by KRT2A gene is expressed in the upper spinous layer of epidermal 
keratinocytes. Mutations in this gene may be associated with bullous congenital 
ichthyosiform erythroderma. Keratin 2A is an intermediate filament component that may 
have a role in terminal cornification of epidermal keratinocytes. Nucleotides 3203 to 3246 
of SEQ ID NO: 203 have 93% sequence identity with an intron sequence of LOC221618 
which is located at chromosome 6p21 .32. 

[0375] CPS 217 corresponds to UNK_AF070571 (EXT1). CPS 217 aligns to the 3' 

untranslated region of EXT 1. EXT1 encodes exostoses (multiple) 1, and has LocusID: 2131 
with reported cytogenetic location 8q24.1 l-q24.13. Exostoses (multiple) 1 (EXT1) is an 
ER-resident type II transmembrane glycosyltransferase involved in the chain elongation 
step of heparan sulfate biosynthesis. It is involved in hereditary multiple exostoses, a 
disorder characterized by cartilaginous excrescences near the ends of the diaphyses of the 
bones of the extremities. 

[0376] CPS 218 corresponds to PPP3CB which encodes protein phosphatase 3 

(formerly 2B), catalytic subunit, beta isoform (calcineurin A beta). The gene has LocusID: 
5532, and is located on chromosome 10 with reported cytogenetic location 10q21-q22. The 
product encoded by the gene, which is also known as catalytic subunit of calmodulin 
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regulated protein phosphatase 3, may regulate activity of transcription factors involved in 
signal transduction and growth control. 

[03771 CPS 219 corresponds to QSCN6 which encodes quiescin Q6. The gene has 

LocusID: 5768, and is located on chromosome 1 with reported cytogenetic location lq24. 
The protein encoded by the gene contains domains of thioredoxin and ERV1, members of 
two long-standing gene families. The expression of QSCN6 gene is induced when 
fibroblasts begin to exit the proliferative cycle and enter quiescence, suggesting that QSCN6 
gene may play a role in growth regulation. Quiescin Q6 has similarity to thioredoxins and 
S. cerevisiae Ervlp. 

[0378] CPS 220 corresponds to PRF1 which encodes perforin 1 (pore forming 

protein). The gene has LocusID: 5551, and is located on chromosome 10 with reported 
cytogenetic location 10q22. Perforin 1 is a cytolytic, channel-forming protein, and may 
play a role in clearing virally infected host cells and tumor cells. CPS 220 is located in the 
3' untranslated region of the gene. 

[0379] Affymetrix annotation suggests that CPS 221 corresponds to FCGR3B. 

FCGR3B encodes Fc fragment of IgG, low affinity Illb, receptor for (CD 16). The gene has 
LocusID: 2215, and is located at chromosome lq23. 

[0380] Blast search against the Entrez human genome database shows that CPS 221 

also aligns with FCGR3A with over 97% sequence identity. FCGR3 A encodes Fc fragment 
of IgG, low affinity Ilia, receptor for (CD16). FCGR3A has LocusID: 2214, and is located 
on chromosome 1 with reported cytogenetic location lq23. FCGR3A gene product is a 
Type III Fc gamma receptor. It can associate with zeta chain of the T-cell receptor complex 
(CD3Z), and is a member of the immunoglobulin superfamily. FCGR3B gene is located 3' 
to FCGR3A gene on chromosome 1. 

[0381] CPS 222 corresponds to PTGS2 which encodes prostaglandin-endoperoxide 

synthase 2 (prostaglandin G/H synthase and cyclooxygenase). The gene has LocusID: 
5743, and is located on chromosome 1 with reported cytogenetic location Iq25.2-q25.3. 
Prostaglandin-endoperoxide synthase (PTGS), also known as cyclooxygenase, is a key 
enzyme in prostaglandin biosynthesis, and acts both as a dioxygenase and as a peroxidase. 
There are two isozymes of PTGS: a constitutive PTGS1 and an inducible PTGS2. The two 
isoforms differ in their regulation of expression and tissue distribution PTGS2 gene 
encodes PTGS2 protein, which shows 86-89% amino acid sequence identity with mouse, 
rat, sheep, bovine, horse and rabbit PTGS2 proteins. Human PTGS2 gene appears to be 
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expressed in a limited number of cell types and regulated by specific stimulatory events, 
suggesting that it may be responsible for the prostanoid biosynthesis involved in 
inflammation and mitogenesis. The expression of PTGS2 gene may be deregulated in 
epithelial tumors. PTGS2 protein may regulate angiogenesis and cell migration, and 
catalyze the rate-limiting step in the formation of inflammatory prostaglandins. 
[0382] CPS 223 corresponds to OPHN1 which encodes oligophrenin 1. The gene 

has LocusID: 4983, and is located on chromosome X with reported cytogenetic location 
Xql2. Oligophrenin 1 has at least 25 exons and may encode a Rho-GTPase-activating 
protein. The Rho proteins are important mediators of intracellular signal transduction which 
affects cell migration and cell morphogenesis. Mutations in OPHN1 gene may be 
responsible for non-specific X-linked mental retardation. Nucleotides 2971 to 3363 of SEQ 
ID NO: 210 (AJ001189) have 84% sequence identity with an intron sequence of putative 
gene LOC20086 1 which is located at chromosome 3p24. 1 . 

[0383] CPS 224 corresponds to VSNL1 which encodes visinin-like 1. The gene has 

LocusID: 7447, and is located on chromosome 2 with reported cytogenetic location 2p24.3. 
Visinin-like protein 1 may bind calcium. The protein is similar to rat Vsnll. 
[0384] CPS 225 corresponds to FECH which encodes ferrochelatase 

(protoporphyria). The gene has LocusID: 2235, and is located on chromosome 18 with 
reported cytogenetic location 18q21.3. Ferrochelatase is localized to the mitochondrion 
where it catalyzes the insertion of ferrous form of iron into protoporphyrin IX in the heme 
synthesis pathway. Defects in ferrochelatase are associated with protoporphyria. CPS 225 
is located in the 3' untranslated region of the gene. 

[0385] SEQ ID NO: 282 (D00726) also aligns to FECH with over 97% sequence 

identity, and can be used to design probes for detecting the expression level of FECH. 
Nucleotides 167 to 1972 of SEQ ID NO: 282 have 82-84% sequence identity with 
LOC205467. LOC205467 is a putative gene, and located on chromosome 3 with reported 
cytogenetic location 3p22.1. 

[0386] CPS 226 corresponds to KIAA0483 which encodes KIAA0483 protein. The 

gene has LocusID: 23219, and is located on chromosome 1 with reported cytogenetic 
location lq41. CPS 227 corresponds to HK3 which encodes hexokinase 3 (white 

cell). The gene has LocusID: 3101, and is located on chromosome 5 with reported 
cytogenetic location 5q35.2. Hexokinases phosphorylate glucose to produce glucose-6- 
phosphate, thus committing glucose to the glycolytic pathway. HK3 gene encodes 
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hexokinase 3 which is similar to hexokinases 1 and 2. Hexokinase 3 is an allosteric enzyme 
and can be inhibited by its product glucose-6-phosphate. 

[0387] CPS 228 corresponds to MS4A3 which encodes membrane-spanning 4- 

domains, subfamily A, member 3 (hematopoietic cell-specific). The gene has LocusID: 
932, and is located on chromosome 1 1 with reported cytogenetic location 1 Iql2-ql3.1 . The 
gene product has low similarity to CD20 and the beta subunit of FCER1B. It contains four 
predicted membrane-spanning domains, and may play a role in signal transduction. 
[0388] CPS 229 corresponds to SCYA20 which encodes small inducible cytokine 

subfamily A (Cys-Cys), member 20. The gene has LocusID: 6364, and is located on 
chromosome 2 with reported cytogenetic location 2q33-q37. The gene product Cytokine 
A20 (exodus) is a chemotactic factor for lymphocytes, but not a chemotactic factor for 
monocytes. 

[0389] CPS 230 corresponds to C1QR1 which encodes complement component 1, q 

subcomponent, receptor 1. The gene has LocusID: 22918, and is located on chromosome 
20 with reported cytogenetic location 2 Op 11.21. This gene encodes a type I membrane 
protein. The encoded protein acts as a receptor for complement protein Clq, mannose- 
binding lectin, and pulmonary surfactant protein A. The protein is a functional receptor 
involved in ligand-mediated enhancement of phagocytosis. It may play a role in phagocytic 
destruction of pathogens and immune complexes. 

[0390] CPS 230 also aligns with a chromosomal region near putative gene 

LOC200421 with about 99% sequence identity. LOC200421 has reported cytogenetic 
location 2p 12. 

[0391] CPS 231 corresponds to POU1F1 which encodes POU domain, class 1, 

transcription factor 1 (Pitl, growth hormone factor 1). The gene has LocusID: 5449, and is 
located on chromosome 3 with reported cytogenetic location 3pll. The gene product, also 
known as POU homeodomain transcription factor 1, may regulate PRL, GH and TSH genes. 
[0392] CPS 232 corresponds to TKTL1 which encodes transketolase-like 1. The 

gene has LocusID: 8277, and is located on chromosome X with reported cytogenetic 
location Xq2 8. Transketolase 1 is a thiamine pyrophosphate-dependent enzyme in the 
pentose phosphate pathway. 

[0393] CPS 234 corresponds to CCNT2 which encodes cyclin T2. The gene has 

LocusID: 905, and is located on chromosome 2 with reported cytogenetic location 2ql4.3. 
The protein encoded by this gene belongs to a highly conserved cyclin family, whose 
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members are characterized by a dramatic periodicity in protein abundance through the cell 
cycle. Cyclins function as regulators of CDK kinases. Different cyclins exhibit distinct 
expression and degradation patterns which contribute to the temporal coordination of each 
mitotic event. Cyclin T2 and its kinase partner CDK9 were found to be subunits of the 
transcription elongation factor p-TEFb. The p-TEFb complex containing cyclin T2 was 
reported to interact with, and act as a negative regulator of human immunodeficiency virus 
type 1 (HIV-1) Tat protein. At least two alternatively spliced transcript variants, which 
encode distinct isoforms, have been described. 

[0394] Nucleotides 261 to 723 and 936 to 1349 of SEQ ID NO: 220 (AF048732) 

have about 88% sequence identity to a chromosomal region on chromosome 1. 
[0395] CPS 235 corresponds to ATP 6 VI H which encodes ATPase, H+ transporting, 

lysosomal 50/57kD VI subunit H. The gene has LocusID: 51606, and is located on 
chromosome 8 with reported cytogenetic location 8p22-q22.3. The polypeptide encoded by 
the gene is also known as CGI- 11 protein [H.sapiens]. An intron of ATP6V1H gene 
includes RGS20 gene. RGS20 encodes regulator of G-protein signalling 20, and has 
LocusID: 8601. 

[0396] CPS 236 corresponds to FN1 which encodes fibronectin 1. The gene has 

LocusID: 2335, and is located on chromosome 2 with reported cytogenetic location 2q34. 
Fibronectin is a glycoprotein present in a soluble dimeric form in plasma, and in a dimeric 
or multimeric form at the cell surface and in extracellular matrix. Fibronectin is involved in 
cell adhesion and migration processes including embryogenesis, wound healing, blood 
coagulation, host defense, and metastasis. FN1 gene has three regions subject to alternative 
splicing, with the potential to produce 20 different transcript variants. 

[0397] CPS 237 corresponds to UNK J04178 which is located in an intron of 

HEXA. HEXA encodes hexosaminidase A (alpha polypeptide). HEXA has LocusID: 
3073, and is located on chromosome 15 with reported cytogenetic location 15q23-q24. 
Hexosaminidase A is the alpha subunit of the lysosomal enzyme beta-hexosaminidase 
which, together with the cofactor GM2 activator protein, catalyzes the degradation of the 
ganglioside GM2, and other molecules containing terminal N-acetyl hexosamines. Beta- 
hexosaminidase is composed of two subunits, alpha and beta, which are encoded by 
separate genes. Both beta-hexosaminidase alpha and beta subunits are members of family 
20 of glycosyl hydrolases. Mutations in the alpha or beta subunit genes may lead to an 
accumulation of GM2 ganglioside in neurons and neurodegenerative disorders termed the 
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GM2 gangliosidoses. Alpha subunit gene mutations may lead to Tay-Sachs disease (GM2- 
gangliosidosis type I). The chromosomal region that aligns to CPS 237 is located in an 
intron of HEXA. 

[0398] CPS 237 also aligns with LOC145709 which is a hypothetical gene 

supported by J04178. LOC145709 has reported cytogenetic location 15q22.32. 
[0399] CPS 239 corresponds to NR2C1 which encodes nuclear receptor subfamily 

2, group C, member 1. The gene has LocusID: 7181, and is located on chromosome 12 with 
reported cytogenetic location 12q21.32-q21.33. The gene product can exist in multiple 
isoforms with different ligand-binding domains. 

[0400] CPS 240 corresponds to RASSF2 (KIAA0168) which encodes Ras 

association (RalGDS/AF-6) domain family 2. The gene has LocusID: 9770, and is located 
on chromosome 20 with reported cytogenetic location 20pter-p 12.1. The alternative name 
for this gene product is KIAA0168 protein. 

[0401] CPS 241 corresponds to IL6 which encodes interleukin 6 (interferon, beta 2). 

The gene has LocusID: 3569, and is located on chromosome 7 with reported cytogenetic 
location 7p21. Interleukin 6 (interferon-beta 2) may induce the maturation of B cells into 
immunoglobulin-secreting cells. 

[0402] CPS 242 corresponds to KIAA0372 which encodes KIAA0372 gene product. 

The gene has LocusID: 9652, and is located on chromosome 5 with reported cytogenetic 
location 5q2 1 . 1 -q2 1 .2. 

[0403] CPS 243 corresponds to CYP4F2 which encodes cytochrome P450, 

subfamily IVF, polypeptide 2. The gene has LocusID: 8529, and is located on chromosome 
19 with reported cytogenetic location 19pter-p 13.11. This gene encodes a member of the 
cytochrome P450 superfamily of enzymes. The cytochrome P450 proteins are 
monooxygenases which catalyze many reactions involved in drug metabolism and synthesis 
of cholesterol, steroids and other lipids. The cytochrome P450 proteins localize to the 
endoplasmic reticulum. They may start the process of inactivating and degrading 
leukotriene B4, a potent mediator of inflammation. CYP4F2 gene is part of a cluster of 
cytochrome P450 genes on chromosome 19. Another member of this family, CYP4F1 1, is 
approximately 16 kb away. 

[0404] CPS 243 also aligns with CYP4F3 with about 97% sequence identity. 

CYP4F3 encodes cytochrome P450, subfamily IVF, polypeptide 3 (leukotriene B4 omega 
hydroxylase). It has LocusID: 4051, and is located on chromosome 19 with reported 
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cytogenetic location 19pl3.2. CYP4F3 encodes a member of the cytochrome P450 
superfamily of enzymes. This gene is also part of a cluster of cytochrome P450 genes on 
chromosome 19. Another member of this family, CYP4F8, is approximately 18 kb away. 
CYP4F3 gene product may convert leukotriene B4 into the less active 20-hydroxy- 
leukotriene B4. 

[0405] Various fragments in nucleotides 253 to 1639 of U02388 (SEQ ID NO: 228) 

align to various genes with about 83-93% sequence identity. These genes include 
LOC126538, LOC126537, LOC126407, CYP4F12, and CYP4F8. LOC126538 and 
LOCI 26537 encode proteins similar to cytochrome P450, subfamily IVF, polypeptide 2 
(leukotriene B4 omega-hydroxylase) (leukotriene-B4 20-monooxygenase). Both genes are 
located at chromosome 19pl3.12. LOCI 26407 encodes a protein similar to cytochrome 
P450, and is located on chromosome 19. CYP4F12 encodes cytochrome P450, subfamily 
IVF, polypeptide 12. CYP4F 12 has LocusID: 66002. CYP4F8 encodes cytochrome P450, 
subfamily IVF, polypeptide 8, and has LocusID: 1 1283. 

[0406] Nucleotides 446 to 1457 of SEQ ID NO: 228 (U02388) also align with a 

chromosomal region between the coding sequences of LOC222275 and CYP4F11. 
LOC222275 encodes a protein similar to mitochondrial RNA polymerase, and has reported 
cytogenetic location 19p 13.12. CYP4F11 encodes cytochrome P450, subfamily IVF, 
polypeptide 11, and has LocusID: 57834. CYP4F11 has reported cytogenetic location 
19pl3.1. 

[0407] CPS 244 corresponds to STIP1 which encodes stress-induced- 

phosphoprotein 1 (Hsp70/Hsp90-organizing protein). The gene has LocusID: 10963, and is 
located on chromosome 1 1 with reported cytogenetic location 1 lql3. 

[0408] Nucleotides 1 to 1086 of SEQ ID NO: 229 (M86752) have 100% sequence 

identity with STIP1. STIP1 encodes stress-induced-phosphoprotein 1 (Hsp70/Hsp90- 
organizing protein). The gene has LocusID: 10963, and is located on chromosome 1 1 with 
reported cytogenetic location llql3. The gene product is similar to S. cerevisiae Stilp, and 
has TPR repeats. The sequence alignment between nucleotides 1 to 1086 of M86752 and 
STIP1 is located in an intron of putative gene LRP16. LRP16 encodes LRP16 protein, and 
has LocusID: 28992. LRP16 has reported cytogenetic location 1 lql 1. LRP16 gene product 
contains a region having low similarity to the H2A histone family. 

[0409] Nucleotides 69 to 1086 of SEQ ID NO: 229 have over 99% sequence 

identity with a chromosomal region between the coding sequences of NAALADASEL and 
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LOC220489. NAALADASEL encodes N-acetylated alpha-linked acidic dipeptidase-like 
(ILEAL DIPEPTIDYLPEPTIDASE), and has LocusID: 10004. LOC220489 encodes a 
protein similar to stress-induced phosphoprotein 1 . 

[0410] Moreover, CPS 244 aligns with LOC170030 and a region near LOC121392 

with 85-93% sequence identity. LOC170030 encodes a protein similar to transformation- 
sensitive protein IEF SSP 3521 (human). It is located at chromosome Xq21.1. LOC121392 
encodes a protein similar to keratin complex 2, gene 6g. It is located at chromosome 12ql2. 
[0411] CPS 245 corresponds to SERPINH2 (CBP2) which encodes serine (or 

cysteine) proteinase inhibitor, clade H (heat shock protein 47), member 2. The gene has 
LocusID: 872, and is located on chromosome 11 with reported cytogenetic location 
1 lql3.5. The gene product is also known as collagen-binding protein 2 or colligen 2. It is a 
collagen-binding protein that acts as a heat shock protein. 

[0412] CPS 245 also aligns with LOC158172 with about 91% sequence identity. 

LOCI 58 172 encodes a protein similar to collagen-binding protein 2 precursor (colligin 2) 
(Rheumatoid arthritis related antigen RA-A47). LOCI 581 72 is located at chromosome 
9pll.2. 

[0413] CPS 247 corresponds to NCF1 which encodes neutrophil cytosolic factor 1 

(47kD, chronic granulomatous disease, autosomal 1). The gene has LocusID: 4687, and is 
located on chromosome 7 with reported cytogenetic location 7q 11.23. NCF1 encodes 
neutrophil cytosolic factor 1, the 47-kilodalton cytosolic subunit of the multi-protein 
complex known as NADPH oxidase found in neutrophils. This oxidase produces a burst of 
superoxide which is delivered to the lumen of the neutrophil phagosome. Mutations in 
NCF1, as well as in other NADPH oxidase subunits, may result in chronic granulomatous 
disease. 

[0414] CPS 247 also aligns with LOC220830 with over 95% sequence identity. 

LOC220830 encodes a protein similar to neutrophil cytosolic factor 1 (47kD, chronic 
granulomatous disease, autosomal 1). LOC220830 is located on chromosome 7 with 
reported cytogenetic location 7p 1 3 . 

[0415] Affymetrix annotation suggests that CPS 248 corresponds to CHN2. Blast 

search against the Entrez human genome database shows that CPS 248 also aligns to the 3' 
untranslated region of LOC222172 with 99% sequence identity. LOC222172 encodes Beta- 
chimaerin (Beta-chimerin). The gene is located on chromosome 7 with reported cytogenetic 
location 7p2 1.1 -pi 5.3. 
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[0416] Nucleotides 456 to 2446 of SEQ ID NO: 284 (U07223) align with 

LOC222172 with over 97% sequence identity. Nucleotides 4 to 473 of SEQ ID NO: 284 
(U07223) have 97% sequence identity with GFAP. GFAP encodes glial fibrillary acidic 
protein. It has LocusID: 2670, and is located on chromosome 17 with reported cytogenetic 
location 17q21 . Glial fibrillary acidic protein is an intermediate filament protein. 
[0417] CPS 249 corresponds to ABL1 which encodes v-abl Abelson murine 

leukemia viral oncogene homolog 1. The gene has LocusID: 25, and is located on 
chromosome 9 with reported cytogenetic location 9q34.1. The ABL1 protooncogene 
encodes a cytoplasmic and nuclear protein tyrosine kinase that has been implicated in 
processes of cell differentiation, cell division, cell adhesion, and stress response. Activity of 
ABL1 protein is negatively regulated by its SH3 domain, and deletion of the SH3 domain 
turns ABL1 into an oncogene. The t(9;22) translocation results in the head-to-tail fusion of 
the BCR (MIM:151410) and ABL1 genes present in many cases of chronic myelogeneous 
leukemia. The DNA-binding activity of the ubiquitously expressed ABL1 tyrosine kinase is 
regulated by CDC2-mediated phosphorylation, suggesting a cell cycle function for ABL1. 
The ABL1 gene can be expressed as a 6- or 7-kb mRNA transcript, with alternatively 
spliced first exons spliced to the common exons 2-11. 

[0418] CPS 250 corresponds to FLOT1 which encodes flotillin 1. The gene has 

LocusID: 10211, and is located on chromosome 6 with reported cytogenetic location 
6p21.3. Caveolae are small domains on the inner cell membrane involved in vesicular 
trafficking and signal transduction. FLOT1 encodes a caveolae-associated, integral 
membrane protein. The function of flotillin 1 has not been determined. Flotillin 1 is similar 
to murine flotillin (Mm.2931). 

[0419] CPS 250 also aligns to an intron sequence of LOC20301 1 with about 91% 

sequence identity. LOC20301 1 is located at chromosome 8q23.3. 

[0420] CPS 251 corresponds to REV3L which encodes REV3-like, catalytic subunit 

of DNA polymerase zeta (yeast). The gene has LocusID: 5980, and is located on 
chromosome 6 with reported cytogenetic location 6q21. Catalytic subunit of DNA 
polymerase zeta acts in translation replication, and may be involved in mutagenesis. 
[0421] Affymetrix annotation suggests that CPS 252 corresponds to MUC3 which 

encodes mucin 3, intestinal. The gene has LocusID: 4584, and is located on chromosome 7 
with reported cytogenetic location 7q22. 
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[0422] CPS 253 corresponds to SMARCA4 which encodes SWI/SNF related, 

matrix associated, actin dependent regulator of chromatin, subfamily a, member 4. The 
gene has LocusID: 6597, and is located on chromosome 19 with reported cytogenetic 
location 19pl3.2. The protein encoded by this gene is a member of the SWI/SNF family of 
proteins and is similar to the brahma protein of Drosophila. Members of this family have 
helicase and ATPase activities and are thought to regulate transcription of certain genes by 
altering the chromatin structure around those genes. The encoded protein is part of the large 
ATP-dependent chromatin remodeling complex SNF/SWI, which is required for 
transcriptional activation of genes normally repressed by chromatin. In addition, the 
encoded protein can bind BRCA1, as well as regulate the expression of the tumorigenic 
protein CD44. Alternatively spliced transcripts have been found for this gene. 
[0423] Nucleotides 2063 to 2094 of SEQ ID NO: 238 (U29175) have 100% 

sequence identity with vairoious regions in the human genome. These regions include 
LOC203511, which is located at chromosome Xp22.31, and a chromosomal region near 
LOC200164 on chromosome 1. 

[0424] CPS 254 corresponds to LOC92684 which encodes hypothetical gene 

supported by AF035314. The gene is located on chromosome 20 with reported cytogenetic 
location 20pl 1.21. The sequence alignment between CPS 254 and LOC92684 is located in 
an intron of C20orfl9. C20orfl9 refers to chromosome 20 open reading frame 19. It has 
LocusID: 55857, and is reportedly located at chromosome 20pter-ql 1 .23. 
[0425] CPS 255 corresponds to EEF1A2 which encodes eukaryotic translation 

elongation factor 1 alpha 2. The gene has LocusID: 1917, and is located on chromosome 20 
with reported cytogenetic location 20ql3.3. The gene product has a guanine nucleotide- 
binding site, and may be involved in the binding of aminoacyl-tRNA to the ribosome during 
peptide synthesis. 

[0426] CPS 256 corresponds to BRF2 (ZFP36L2) which encodes zinc finger protein 

36, C3H type-like 2. The gene has LocusID: 678, and is located on chromosome 2 with 
reported cytogenetic location 2p22.3-p21. This gene is a member of the TIS11 family of 
early response genes. Family members are induced by various agonists such as the phorbol 
ester TP A and the polypeptide mitogen EGF. The protein encoded by this gene contains a 
distinguishing putative zinc finger domain with a repeating cys-his motif. The encoded 
protein is a putative nuclear transcription factor, and may function in regulating the 
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response to growth factors. The sequence alignment between CPS 256 and BRF2 overlaps 
LOC151103 and LOC165204. 

[04271 Nucleotides 3862 to 4187 and 4238 to 4907 of SEQ ID NO: 286 have 84- 

86% sequence identity to a chromosomal region near LOC143974. LOC143974 is located 
at chromosome 1 lpl4.1. Nucleotides 5004 to 5497 of SEQ ID NO: 286 align to an intron 
sequence of KIAA1301 with 82% sequence identity. KIAA1301 encodes KIAA1301 
protein, and is located at chromosome 2q33.1. 

[0428] CPS 257 corresponds to SNRPG which encodes small nuclear 

ribonucleoprotein polypeptide G. The gene has LocusID: 6637, and is located on 
chromosome 2 with reported cytogenetic location 2pl2. The gene product is also known as 
spliceosomal snRNA-associated Sm core protein G, and may be involved in the biogenesis 
of the snRNPs. 

[0429] CPS 257, or fragments thereof, also aligns to various regions or genes with 

about 95-96% sequence identity. These regions or genes include a chromosomal region 
between LOCI 62681 and LOCI 25307, an intron sequence of RGS19IP1, an intron 
sequence of FLJ10748, a chromosomal region near SKD3, POLE2, and an intron sequence 
of OPTN. Both LOC162681 and LOC125307 have reported cytogenetic location 18q21.2. 
RGS19IP1 encodes regulator of G-protein signalling 19 interacting protein 1, and has 
LocusID: 10755. RGS19IP1 is located on chromosome 19 with reported cytogenetic 
location 19pl3.1. FLJ 10748 encodes hypothetical protein FLJ 10748, and is reportedly 
located at chromosome lq31.2. SKD3 encodes suppressor of potassium transport defect 3. 
It has LocusID: 81570 and reported cytogenetic location llql3.3. POLE2 encodes 
polymerase (DNA directed), epsilon 2 (p59 subunit), and has LocusID: 5427. It is located 
at chromosome 14q21-q22. OPTN encodes optineurin, and has LocusID: 10133. OPTN is 
located at chromosome 10pl2.33. 

[0430] In addition, fragments of CPS 257 align to various regions or genes with 

about 85-92% sequence identity. These regions or genes include a chromosomal region 
near LOC164917, a region located 5' to ABCA5, an intron sequence of KIAA1170, and 
chromosomal regions near SPG3A, LOC201203, LOC205322, LOC203775 and ERG, 
respectively. LOC164917 is located at chromosome 2ql2.2. ABCA5 encodes ATP- 
binding cassette, sub-family A (ABC1), member 5. ABCA5 has LocusID: 23461, and is 
located at chromosome 17q24.3. KJAA1170 encodes KIAA1170 protein, and is located at 
chromosome 7q31.1. SPG3A encodes spastic paraplegia 3 A (autosomal dominant). 
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SPG3A has LocusID: 51062, and is located at chromosome 14q21.3. LOC201203, 
LOC205322, LOC203775 and ERG are located at chromosome 17q22, 2p23.3, 10q26.2 and 
21q22.3, respectively. LOC203775 encodes a protein similar to high mobility group protein 
4 (HMG-4) (high mobility group protein 2a) (HMG-2a). ERG encodes v-ets 
erythroblastosis virus E26 oncogene like (avian), and has LocusID: 2078. 
[04311 CPS 258 corresponds to NUMA1 which encodes nuclear mitotic apparatus 

protein 1. The gene has LocusID: 4926, and is located on chromosome 11 with reported 
cytogenetic location llql3. The gene product is a structural component of the nucleus. It 
contains a predicted coiled-coil domain, and is predicted to have a role in nuclear 
reassembly in late mitosis. 

[0432] CPS 259 corresponds to AKR1B1 which encodes aldo-keto reductase family 

1, member Bl (aldose reductase). The gene has LocusID: 231, and is located on 
chromosome 7 with reported cytogenetic location 7q35. The gene product is also known as 
aldo-keto reductase 1B1 (aldose reductase, aldehyde dehydrogenase). It can reduce glucose 
and other carbonyl-containing substrates. The gene product is a member of the NADPH- 
dependent aldo-keto reductase superfamily. 

[0433] Fragments of SEQ ID NO: 289 align to other genes or regions with about 83- 

92% sequence identity. These genes or regions include LOCI 26242, LOCI 63 862, 
LOC131710, LOC145401, LOC170139, LOC125836, and a chromosomal region near 
LOC220082. LOCI 26242 encodes a protein similar to aldose reductase (AR) (aldehyde 
reductase), and is located at chromosome 19ql3.12. LOC163862 also encodes a protein 
similar to aldose reductase. It is located at chromosome lq41. LOC131710 and 
LOC125836 encodes proteins similar to aldose reductase (E.C.I. 1.1.21) (Mutant With Tyr 
48 Replaced By His (Y48h) Complexed With Nadp+ And Citrate), and are located at 
chromosome 3pl3 and 18pl 1.21, respectively. LOC145401 encodes a protein similar to 
aldo-keto reductase family 1, member Bl (aldose reductase). LOCI 45401 is located at 
chromosome 14q22.3. LOC170139 is located at chromosome Xq23, and encodes a protein 
similar to aldose reductase (AR) (aldehyde reductase). LOC220082 is located at 
chromosome 13ql4.11. 

[0434] CPS 260 corresponds to SMARCE1 which encodes SWI/SNF related, matrix 

associated, actin dependent regulator of chromatin, subfamily e, member 1. The gene has 
LocusID: 6605, and is located on chromosome 17 with reported cytogenetic location 
17q21.1. The protein encoded by this gene is part of the large ATP-dependent chromatin 
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remodeling complex SWI/SNF, which is required for transcriptional activation of genes 
normally repressed by chromatin. The encoded protein, either alone or when in the 
SWI/SNF complex, can bind to 4-way junction DNA, which is thought to mimic the 
topology of DNA as it enters or exits the nucleosome. The encoded protein contains a 
DNA-binding HMG domain, but disruption of this domain does not abolish the DNA- 
binding or nucleosome-displacement activities of the SWI/SNF complex. SNF/SWI 
complex is associated with the nuclear matrix and implicated in regulation of transcription 
by affecting chromatin structure. 

[0435] SEQ ID NO: 290 aligns to SMARCE1 with over 98% sequence identity and 

therefore, can be used to prepare probes directed to SMARCE1. Nucleotides 10 to 1377 of 
SEQ ID NO: 290 (AF035262) also show about 90-94% sequence identity with LOCI 60863, 
LOC145357 and LOCI 34699. All of these three putative genes encode proteins similar to 
SWI/SNF related, matrix associated, actin dependent regulator of chromatin, subfamily e, 
member 1. LOC160863, LOC145357 and LOC134699 are located at chromosome 
13ql4.11, 14qll.l and 6ql 6.1, respectively. 

[0436] CPS 261 corresponds to KIAA0669 which encodes KIAA0669 gene product. 

The gene has LocusID: 9819, and is located on chromosome 3 with reported cytogenetic 
location 3 q2 5.1. Aflfymetrix annotation suggests that CPS 262 corresponds to MSF 
which encodes MLL septin-like fusion. The gene has LocusID: 10801, and is located on 
chromosome 17 with reported cytogenetic location 17q25. 

[0437] SEQ ID NO: 292 aligns to a chromosomal region on chromosome 17 with 

over 99% sequence identity. The region includes LOC204508, FLJ12190, LOC204512 and 
LOC197453. All of these genes have reported cytogenetic location 17q25.3. FLJ12190 has 
LocusID: 80141. LOCI 97453 encodes a protein similar to hypothetical protein SBBI23. 
[0438] CPS 263 corresponds to PTMA which encodes prothymosin, alpha (gene 

sequence 28). The gene has LocusID: 5757, and is located on chromosome 2 with reported 
cytogenetic location 2q35-q36. Prothymosin alpha may be associated with cell 
proliferation. 

[0439] Nucleotides 43 to 1200 of SEQ ID NO: 293 also align to LOC220771 with 

98% sequence identity. LOC220771 encodes prothymosin alpha, and is reportedly located 
at chromosome 5q23.2. In addition, CPS 263, or fragments thereof, align with LOCI 45 123, 
LOC220508, a chromosomal region between PZP and DDX12, and an intron sequence of 
TRIP 11 with about 94-95% sequence identity. LOCI 45 123 is located at chromosome 
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13q22.3. LOC220508 encodes prothymosin alpha, and is located at chromosome 12pl2.3. 
PZP encodes pregnancy-zone protein, and has LocusID: 5858. It is located at chromosome 
12pl3-pl2.2. DDX12 encodes DEAD/H (Asp-Glu-Ala-Asp/His) box polypeptide 12 
(CHLl-like helicase homolog, S. cerevisiae), and has LocusID: 1664. DDX12 is located at 
chromosome 12pl3. TRIP 11 encode thyroid hormone receptor interactor 11, and has 
LocusID: 9321. TRIP11 is located at chromosome 14q31-q32. CPS 263, or fragments 
thereof, also aligns to other regions in the human genome with 90-95% sequence identity. 
[0440] CPS 264 corresponds to KIAA0410 which encodes KIAA0410 gene product. 

The gene has LocusID: 9818, and is located on chromosome 13 with reported cytogenetic 
location 13ql2.12. 

[0441] CPS 265 corresponds to PSMD3 which encodes proteasome (prosome, 

macropain) 26S subunit, non-ATPase, 3. The gene has LocusID: 5709, and is located on 
chromosome 17 with reported cytogenetic location 17ql2. 

[0442] CPS 266 corresponds to C1QBP which encodes complement component 1, q 

subcomponent binding protein. The gene has LocusID: 708, and is located on chromosome 
17 with reported cytogenetic location 17pl3.3. The human complement subcomponent Clq 
associates with Clr and Cls to yield the first component of the serum complement system. 
The protein encoded by C1QBP gene is known to bind to the globular heads of Clq 
molecules and inhibit CI activation. This protein has also been identified as the p32 subunit 
of pre-mRNA splicing factor SF2, as well as a hyaluronic acid-binding protein. 
[0443] Nucleotides 58 to 1071 and 107 to 1037 of SEQ ID NO: 296 align to 

C1QBPP and an intron sequence of RYR3 with 79-84% sequence identity. C1QBPP 
encodes complement component 1, q subcomponent binding protein, pseudogene. It has 
LocusID: 54098, and is located at chromosome 21q21.1. RYR3 encodes ryanodine receptor 
3. RYR3 has LocusID: 6263, and is located at chromosome 15ql4-ql5. 
[0444] In addition, nucleotides 1070 to 1227 of SEQ ID NO: 296 align to 

LOC221903 with 100% sequence identity. LOC221903 is a hypothetical gene supported by 
AF000974, BC004999, AF000974, BC021540, BC004249, AJ001902, AF025437, L40374, 
BC004999, AF025437, AK056773, BC002680, AK056773, BC004999, and BC002680. 
The gene is located at chromosome 7ql 1 . 1 . 

[0445] CPS 267 corresponds to OSR1 which encodes oxidative-stress responsive 1. 

The gene has LocusID: 9943, and is located on chromosome 3 with reported cytogenetic 
location 3p22-p21.3. Oxidative-stress responsive 1 gene has at least 18 exons and is located 
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in the vicinity of three others genes - GOLGA4, ITGA9 and HYA22. These four genes are 
considered to be candidate tumor suppressors. Oxidative-stress responsive 1 protein has 
similarity to human Ste20/oxidant stress response kinase- 1 and is thought to be involved in 
the response to oxidative stress. Oxidative-stress responsive 1 protein is a putative member 
of SOK (Ste20/oxidant stress response kinase) family, and can be activated by oxidative 
stress. 

[0446] CPS 268 corresponds to CD44 which encodes CD44 antigen (homing 

function and Indian blood group system). The gene has LocusID: 960, and is located on 
chromosome 1 1 with reported cytogenetic location 1 lpl3. 

[0447] CPS 269 corresponds to CRADD which encodes CASP2 and RIPK1 domain 

containing adaptor with death domain. The gene has LocusID: 8738, and is located on 
chromosome 12 with reported cytogenetic location 12q21.33-q23.1. The gene product is an 
apoptotic adaptor molecule, and may function to couple CASP2 to the FasL/TNF receptor- 
interacting protein RIP. 

[0448] CPS 270 corresponds to CCRL2 which encodes chemokine (C-C motif) 

receptor-like 2. The gene has LocusID: 9034, and is located on chromosome 3 with 
reported cytogenetic location 3p21. This gene encodes a chemokine receptor-like protein, 
which is predicted to be a seven transmembrane protein and most closely related to CCR1. 
Chemokines and their receptors are believed to be critical for the recruitment of effector 
immune cells to the site of inflammation. CCRL2 gene is expressed at higi levels in 
primary neutrophils and primary monocytes, and is further upregulated on neutrophil 
activation and during monocyte to macrophage differentiation. CCRL2 gene is mapped to 
the region where the chemokine receptor gene cluster is located. The gene product is a 
member of the G protein-coupled receptor family. 

[0449] CPS 271 corresponds to KIAA0707 (THEA) which encodes thioesterase, 

adipose associated. The gene has LocusID: 26027, and is located on chromosome 1 with 
reported cytogenetic. location lp32.2. 

[0450] CPS 272 corresponds to KIAA1113 (TRIM33) which encodes tripartite 

motif-containing 33. The gene has LocusID: 51592, and is located on chromosome 1 with 
reported cytogenetic location 1 pi 3.1. The protein encoded by this gene is thought to be a 
transcriptional corepressor. The encoded protein is a member of the tripartite motif family. 
The tripartite motif includes three zinobinding domains, a RING, a B-box type 1 and a B- 
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box type 2, and a coiled-coil region. At least three alternatively spliced transcript variants 
for this gene have been described. 

[0451] CPS 273 corresponds to a chromosomal region on chromosome 21. This 

region is referred to as UNK_AL050119. The region is located in an intron of TMEM1 
which encodes transmembrane protein 1. TMEM1 has LocusID: 7109 with reported 
cytogenetic location 21q22.3. TMEM1 gene product is similar to sodium channel proteins. 
[0452] CPS 274 corresponds to UNK_AF052115 (LOC151405) which is a 

hypothetical gene supported by AF052115. The gene has reported cytogenetic location 
2q33.3. LOCI 5 1405 gene is located 3' to the polypeptide-coding sequence of ADAM23 
which encodes disintegrin and metalloproteinase domain 23. ADAM23 has LocusID: 8745, 
and is located on chromosome 2 with reported cytogenetic location 2q33. ADAM23 gene 
product is a member of the ADAM protein family. Members of this family are membrane- 
anchored proteins structurally related to snake venom disintegrins, and have been 
implicated in a variety of biologic processes involving cell-cell and cell-matrix interactions, 
including fertilization, muscle development, and neurogenesis. 

[0453] CPS 275 corresponds to MITF which encodes microphthalmia-associated 

transcription factor. The gene has LocusID: 4286, and is located on chromosome 3 with 
reported cytogenetic location 3pl4.1-pl2.3. MITF gene product contains both basic helix- 
loop-helix and leucine zipper structural features. MITF produces at least two alternate 
transcripts: the M-isoform expressed exclusively in melanocytes, and the A-isoform with a 
broader range of expression. Mutations in MITF may lead to Waardenburg syndrome. 
[0454] CPS 276 corresponds to STAT3 which encodes signal transducer and 

activator of transcription 3 (acute-phase response factor). The gene has LocusID: 6774, and 
is located on chromosome 17 with reported cytogenetic location 17q21. 
[0455] The protein encoded by this gene is a member of the STAT protein family. 

In response to cytokines and growth factors, STAT family members can be phosphorylated 
by the receptor associated kinases, and then form homo- or heterodimers that translocate to 
the cell nucleus where they act as transcription activators. The protein encoded by STAT3 
gene can be activated through phosphorylation in response to various cytokines and growth 
factors including IFNs, EGF, IL5, IL6, HGF, LIF and BMP2. The encoded protein can 
mediate the expression of a variety of genes in response to cell stimuli, and thus plays a role 
in many cellular processes such as cell growth and apoptosis. The small GTPase Racl has 
been shown to bind and regulate the activity of this protein. PIAS3 protein is a specific 
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inhibitor of this protein. Two alternatively spliced transcript variants encoding distinct 
isoforms have been described. 

[0456] In addition, nucleotides 16 to 2787 of SEQ ID NO: 315 (L29277) have at 

least 95% sequence identity with STAT3. Therefore, SEQ ID NO: 315 (L29277), or the 
complement thereof, can be used to design probes/primers for detecting the expression of 
STAT3. Nucleotides 217 to 1502 of SEQ ID NO: 315 (L29277) have at least 98% sequence 
identity with LOC254114. LOC2541 14 encodes a protein similar to signal transducer and 
activator of transcription 3 (acute-phase response factor). LOC254114 is located on 
chromosome 17. 

[0457] CPS 277 corresponds to TPD52L2 which encodes tumor protein D52-like 2. 

The gene has LocusID: 7165, and is located on chromosome 20 with reported cytogenetic 
location 20ql3.2-ql3.3. The gene product is a member of the D52-like family of proteins, 
and may have a role in controlling cell proliferation. The gene product contains coiled-coil 
domains. 

[0458] CPS 278 corresponds to a chromosomal region (referred to as 

UNK_AI732885). This chromosomal region is located in an intron of CG005 which 
encodes a hypothetical protein from BCRA2 region. CG005 has LocusID: 10443 with 
reported cytogenetic location 13ql2-ql3. CG005 gene product includes a region having 
low similarity to a region of rat 2 \3 '-cyclic nucleotide 3 '-phosphodiesterase (Rn.3 1 762). 
[0459] CPS 279 corresponds to MAP3K8 which encodes mitogen-activated protein 

kinase kinase kinase 8. The gene has LocusID: 1326, and is located on chromosome 10 
with reported cytogenetic location 1 Op 11.2. This gene was identified by its oncogenic 
transforming activity in cells. The encoded protein is a member of the serine/threonine 
protein kinase family. This kinase can activate both the MAP kinase and JNK kinase 
pathways. This kinase was shown to activate IkappaB kinases, and thus induce the nuclear 
production of NF-kappaB. This kinase was also found to promote the production of TNF- 
alpha and IL-2 during T lymphocyte activation. Studies of a similar gene in rat suggested 
the direct involvement of this kinase in the proteolysis of NF-kappaB 1, pi 05 (NFKB1). 
MAP3K8 gene may also utilize a downstream in-frame translation start codon, and thus 
produce an isoform containing a shorter N-terminus. The shorter isoform has been shown 
to display weaker transforming activity. 

[0460] CPS 280 corresponds to NSP-CL (RTN4) which encodes reticulon 4. The 

gene has LocusID: 57142, and is located on chromosome 2 with reported cytogenetic 
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location 2pl4-pl3. RTN4 gene overlaps LOC200512 on chromosome 2. LOC200512 
encodes a protein similar to reticulon 4. LOC200512 has reported cytogenetic location 
2pl6.1. 

[0461] CPS 281 corresponds to NRG1 which encodes neuregulin 1. The gene has 

LocusID: 3084, and is located at chromosome 8 with reported cytogenetic location 8p21- 
pl2. Neuregulin 1 was originally identified as a 44-kD glycoprotein that interacts with the 
NEU/ERBB2 receptor tyrosine kinase to increase its phosphorylation on tyrosine residues. 
It is known that an extraordinary variety of different isoforms are produced from the NRG1 
gene by alternative splicing. These isoforms include heregulins (HRGs), glial growth 
factors (GGFs) and sensory and motor neuron-derived factor (SMDF). They are tissue- 
specifically expressed and differ significantly in their structure. The HRG isoforms all 
contain immunoglobulin (Ig) and epidermal growth factor-like (EGF-like) domains. The 
GGF and GGF2 isoforms contain a kringle-like sequence plus Ig and EGF-like domains, 
and the SMDF isoform shares only the EGF-like domain with other isoforms. The 
receptors for all NRG1 isoforms are the ERBB family of tyrosine kinase transmembrane 
receptors. Through interaction with ERBB receptors, NRG1 isoforms may induce the 
growth and differentiation of epithelial, neuronal, glial, and other types of cells. 
[0462] CPS 282 corresponds to RAB31 which encodes RAB31, member RAS 

oncogene family. The gene has LocusID: 11031, and is located on chromosome 18 with 
reported cytogenetic location 18pl 1.3. The gene product is a GTP -binding protein. 
[0463] CPS 282 also aligns to LOC12414 and LOC200972 with 83% sequence 

identity. LOCI 241 46 has reported cytogenetic location 16qll.2, and encodes a protein 
similar to GTP-binding protein RabO. LOC200972 is located on chromosome 3, and also 
encodes a protein similar to GTP-binding protein RabO. 

[0464] CPS 283 corresponds to MEF2D which encodes MADS box transcription 

enhancer factor 2, polypeptide D (myocyte enhancer factor 2D). The gene has LocusID: 
4209, and is located on chromosome 1 with reported cytogenetic location Iql2-q23. The 
gene product is a member of the MADS box family of transcription factors, and may 
regulate muscle-specific and mitogen-inducible genes. 

[0465] CPS 285 corresponds to CXCR4 which encodes chemokine (C-X-C motif) 

receptor 4. The gene has LocusID: 7852, and is located on chromosome 2 with reported 
cytogenetic location 2q21 . CXC chemokine receptor (fusin) is a G protein-coupled receptor 
which can mediate intracellular calcium flux. 
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[0466] CPS 286 corresponds to M9 which encodes muscle specific gene. The gene 

has LocusID: 27335, and is located on chromosome 19 with reported cytogenetic location 
19ql3.2. 

[0467] Nucleotides 109 to 858 of SEQ ID NO: 318 have 88% sequence identity 

with LOC134505 which is similar to muscle specific gene. LOC134505 is located on 
chromosome 5 with reported cytogenetic location 5ql5. Nucleotides 100 to 856 of SEQ ID 
NO: 318 also align to a chromosomal region on chromosome 4 with about 85% sequence 
identity. The chromosomal region encompasses LOCI 52771 which is similar to PRO 1474. 
LOG 152771 has reported cytogenetic location 4q26. In addition, nucleotides 140 to 799 of 
SEQ ID NO: 318 align to LOC131480 with about 84% sequence identity. LOC131480 
encodes a protein similar to PRO 1474, and has reported cytogenetic location 3p24.1 . 
[0468] CPS 287 corresponds to FAU which encodes Finkel-Biskis-Reilly murine 

sarcoma virus (FBR-MuSV) ubiquitously expressed (fox derived); ribosomal protein S30. 
The gene has LocusID: 2197, and is located on chromosome 11 with reported cytogenetic 
location 1 lql3. This gene is the cellular homolog of the fox sequence in the Finkel-Biskis- 
Reilly murine sarcoma virus (FBR-MuSV). It encodes a fusion protein consisting of the 
ubiquitin-like protein fiibi at the N terminus and ribosomal protein S30 at the C terminus. It 
has been proposed that the fusion protein is post-translationally processed to generate free 
fiibi and free ribosomal protein S30. Fubi is a member of the ubiquitin family, and 
ribosomal protein S30 belongs to the S30E family of ribosomal proteins. Pseudogenes 
derived from this gene are present in the genome. 

[0469] SEQ ID NO: 319 also aligns to FAUP1 with about 92% sequence identity. 

FAUP1 encodes FBR-MuSV-associated ubiquitously expressed (fox derived) pseudogene 
1. The gene has LocusID: 140623, and is located on chromosome 18. Nucleotides 57 to 
351 of SEQ ID NO: 319 have about 84% sequence identity with LOC151661. LOC151661 
encodes a protein similar to ubiquitin-like/S30 ribosomal fusion protein. LOCI 5 1661 has 
reported cytogenetic location 3q27.2. In addition, nucleotides 454 to 490 of SEQ ID NO: 
319 align to an intron sequence of RHOBTB1 with 97% sequence identity. RHOBTB1 
encodes Rho-related BTB domain containing 1, and has LocusID: 9886 with reported 
cytogenetic location 10q22.1. 

[0470] CPS 288 corresponds to RPS6 which encodes ribosomal protein S6. The 

gene has LocusID: 6194, and is located on chromosome 9 with reported cytogenetic 
location 9p21. This gene encodes a cytoplasmic ribosomal protein that is a component of 
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the 40S subunit in ribosome. The encoded protein belongs to the S6E family of ribosomal 
proteins. It is the major substrate of protein kinases in the ribosome, with subsets of five G- 
terminal serine residues phosphorylated by different protein kinases. It is reported that 
phosphorylation can be induced by a wide range of stimuli, including growth factors, tumor- 
promoting agents, and mitogens. Dephosphorylation occurs at growth arrest. The encoded 
protein may contribute to the control of cell growth and proliferation through the selective 
translation of particular classes of mRNA. This gene has multiple processed pseudogenes 
dispersed through the genome. 

[0471] Fragments of SEQ ID NO: 320 align to various chromosomal regions with 

about 80-97% sequence identity. These chromosomal regions include, for example, 
LOC205865, LOC137397, LOC253482, and an intron sequence of GCDH. LOC205865 
encodes a protein similar to ribosomal protein S6. The gene has reported cytogenetic 
location 4q21.22. LOC137397 encodes a protein similar to Rim2 protein, and is located at 
chromosome 8q22.3. LOC253482 encodes a protein similar to ribosomal protein S6, and is 
located on chromosome 9. GCDH encodes glutaryl-Coenzyme A dehydrogenase. GCDH 
has LocusID: 2639, and is located at chromosome 19pl3.2. 

[0472] CPS 289 corresponds to BAG5 which encodes BCL2-associated athanogene 

5. The gene has LocusID: 9529, and is located on chromosome 14 with reported 
cytogenetic location 14q32.33. The protein encoded by this gene is a member of the BAG1- 
related protein family. BAG1 is believed to be an anti-apoptotic protein that may function 
through interactions with a variety of cell apoptosis and growth related proteins including 
BCL-2, Raf-protein kinase, steroid hormone receptors, growth factor receptors and 
members of the heat shock protein 70 kDa family. The protein encoded by BAGS gene 
contains a BAG domain near the C-terminus, which may bind and inhibit the chaperone 
activity of Hsc70/Hsp70. 

[0473] Nucleotides 3913 to 4117 of SEQ ID NO: 321 show 82% sequence identity 

with an intron sequence of DNAH11. DNAH11 encodes dynein, axonemal, heavy 
polypeptide 11. The gene has LocusID: 8701, and is reportedly located on chromosome 
7p21. 

[0474] CPS 290 corresponds to UNK_AL022721 (RPL10A) which encodes 

ribosomal protein LlOa. RPL10A has LocusID: 4736, and is located on chromosome 6 with 
reported cytogenetic location 6p21.3-p21.2. The gene product is a component of the large 
60S ribosomal subunit. 
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[0475] CPS 290 also has 96% sequence identity with LOC253986 and LOC137107, 

both of which encode proteins similar to ribosomal protein LlOa. LOC253986 is located on 
chromosome 8, and LOCI 371 07 is located at chromosome 8pll.23. In addition, CPS 290 
has about 90-96% sequence identity with intron sequences of PTPRG, BST1, and MARK3. 
PTPRG encodes protein tyrosine phosphatase, receptor type, G. PTPRG has LocusID: 
5793, and is located at chromosome 3p21-pl4. BST1 encodes bone marrow stromal cell 
antigen 1, and has LocusID: 683 with reported cytogenetic location 4pl5. MARK3 encodes 
MAP/microtubule affinity-regulating kinase 3, and has LocusID: 4140 with reported 
cytogenetic location 14q32.3. CPS 290 aligns to LOC138030 with 84% sequence identity. 
LOCI 3 8030 encodes a protein similar to ribosomal protein LlOa, and is located at 
chromosome 8p2 1 . 3 . 

[0476] CPS 290 (SEQ ID NO: 329) is a spliced product of the complement of 

nucleotides 26623 to 27200 of SEQ ID NO: 322. Blast search against the Entrez human 
genome database shows that SEQ ID NO: 322 has 100% sequence identity with a 
chromosomal region on chromosome 6. This chromosomal region is located within 
Genomic Locus NT_007592, and includes the following genes: TEAD3, RPL10A, FANCE, 
LOC221485, and LOC221486. TEAD3 encodes TEA domain family member 3, and has 
LocusID: 7005. RPL10A encodes ribosomal protein LlOa, and has LocusID: 4736. 
FANCE encodes Fanconi anemia, complementation group E, and has LocusID: 2178. 
LOC221485 encodes a protein similar to dJ109F14.3 (PUTATIVE ZNF127 LIKE protein). 
LOC221486 encodes a protein similar to Peroxisome proliferator activated receptor beta 
(PPAR-beta) (PPAR-delta) (Nuclear hormone receptor 1) (NUC1) (NUCI). SEQ ID NO: 
322 aligns to the protein-coding strand of TEAD3, while aligning to the non-protein-coding 
strands of RPL10A, FANCE, LOC221485, and LOC221486. 

[0477] Fragments of SEQ ID NO: 322 show various degrees of sequence identity 

with a plurality of chromosomal regions through the human genome. 

[0478] CPS 291 corresponds to DKZP586E0820 (PKD2) which encodes protein 

kinase D2. The gene has LocusID: 25865, and is located on chromosome 19 with reported 
cytogenetic location 19ql3.2. The gene product is similar to a region of mu isoforms of 
protein kinase C, and may function to mediate protein-protein and protein-lipid interaction. 
The gene product contains a kinase domain and a pleckstrin homology (PH) domain. 
[0479] CPS 292 corresponds to NONO which encodes non-POU domain containing, 

octamer-binding. The gene has LocusID: 4841, and is located on chromosome X with 
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reported cytogenetic location Xql3.1 . The gene product is a nuclear protein which contains 
RNA recognition motifs. 

[0480] SEQ ID NO: 324 also aligns to LOC146455 with about 95-96% sequence 

identity. LOCI 46455 encodes a protein similar to 54 kDa nuclear RNA- and DNA-binding 
protein (p54(nrb)) (p54nrb) (55 kDa nuclear protein) (NMT55) (Non-POU domain- 
containing octamer-binding protein) (DNA-binding P52/P100 complex, 52 kDa subunit). 
LOC146455 is located at chromosome 16q22.3. In addition, nucleotides 514 to 2591 of 
SEQ ID NO: 324 have about 84-85% sequence identity with a chromosomal region which 
overlaps LOC130867. LOC130867 encodes a protein similar to ribosomal protein S12 (40S 
ribosomal protein S12), and is located at chromosome 2pl5. 

[0481] CPS 293 corresponds to UNK_AI743507 (ZFR) which encodes zinc finger 

RNA binding protein. ZFR has LocusID: 51663, and is located on chromosome 5 with 
reported cytogenetic location 5pl3.2. 

[0482] CPS 293 also shows 92% sequence identity with LOCI 19355 which encodes 

a protein similar to M-phase phosphoprotein homolog; likely ortholog of mouse zinc finger 
protein Zfh LOCI 19355 has reported cytogenetic location 10q23.33. In addition, CPS 293 
has 94-96% sequence identity with a chromosomal region on chromosome 1. The 
chromosomal region is close to TSNAX which encodes translin-associated factor X and has 
LocusID: 7257 and cytogenetic location lq42.1. Nucleotides 292 to 399 of CPS 293 have 
about 92% sequence identity with a chromosomal region on chromosome 1 . 
[0483] CPS 294 corresponds to MAPKAPK5 which encodes mitogen-activated 

protein kinase-activated protein kinase 5. The gene has LocusID: 8550, and is located on 
chromosome 12 with reported cytogenetic location 12q24.12. The protein encoded by this 
gene is a member of the serine/threonine kinase family. In response to cellular stress and 
proinflammatory cytokines, this kinase may be activated through its phosphorylation by 
MAP kinases including MAPK1/ERK, MAPK14/p38-alpha, and M APK 1 1 /p3 8-beta. At 
least two alternately spliced transcript variants of this gene encoding distinct isoforms have 
been reported. 

[0484] CPS 295 corresponds to UNKJJ79297 (LOCI 57567) which encodes a 

protein similar to hypothetical protein MGC25673. LOCI 57567 is reportedly located at 
chromosome 8q23 . 1 . 

[0485] The significance of the RCC disease genes listed in Table 4 can be estimated 

using a relative class separation metric according to the supervised classification of RCC 
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versus disease-free. See Golub, et aL, Science, 286: 531-537 (1999), and Slonim, et al. 9 
Procs. of the Fourth Annual International Conference on Computational Molecular Biology, 
Tokyo, Japan, April 8-11, p263-272 (2000). A neighborhood analysis can then be 
performed to determine the significance of the measured correlations. For instance, this 
method can randomly permute the 65 total sample (45 RCC patients and 20 disease-free 
humans) into two groups of 45 and 20 samples each and then rank the genes with the 
highest measures of correlation in the 100 randomized sets of samples. This analysis shows 
that a majority of RCC disease genes identified in the present invention possess measures or 
correlation above the 1% significant level compared to randomly permuted class vectors. 
[0486] The biological mechanisms underlying the differential expression patterns of 

the RCC disease genes in the peripheral blood are not fully understood. The differential 
expression patterns may be attributed to the altered gene expression patterns in shed RCC 
tumor cells in the peripheral blood. For instance, Table 5 shows that a subset of the RCC 
disease genes are also differentially expressed in RCC tumor cells compared to PBMCs of 
disease-free humans. The differential expression pattern may also be caused by 
immunogenic reactions induced by RCC tumors. In one experiment, peripheral blood 
mononuclear cells are isolated from disease-free humans and then treated with 
phytohemagglutinin (PHA). PHA stimulation ex vivo appears to recapitulate the differential 
expression pattern of a significant number of the RCC disease genes of this invention, as 
illustrated in Table 5. This suggests that the differential expression patterns of some RCC 
disease genes in the peripheral blood may arise from an activation of leukocytes in vivo. 
[0487] Table 5 further identifies a substantial subset of RCC disease genes that are 

differentially expressed in patients with end-stage renal failure. Therefore, the differential 
expression patterns of this subset of RCC disease genes in the peripheral blood could be due 
to alterations in leukocytes in response to renal dysfunction in RCC patients. 



Table 5. RCC Disease Genes Differentially Expressed Under Other Conditions 



RCC Disease 
Gene 


Entrez 
Accession No. 


Differentially Expressed in: 
(compared to disease-free PBMCs) 


IL1R1 


M27492 


Ex vivo PHA-stimulated PBMCs 


CSF2 


M13207 


Ex vivo PHA-stimulated PBMCs 


IL1B 




Ex vivo PHA-stimulated PBMCs 


Tubulin, Beta 


AF141349 


Ex vivo PHA-stimulated PBMCs 



W288761.2 



138 



RCC Disease 
Gene 


Entrez 
Accession No. 


Differentially Expressed in: 
(compared to disease-free PBMCs) 


DAOD1 
Jl>/\oJl 1 




ex vivo rri/\-siiiiiuiaieu a divii^s 


QT AT-TO 
ol/VTlZ 


U /OZ^to 


HX vivo r nA-siimuiaieQ x DJVIL^S 


uor 1 1 


/044 


nx vivo r riA-stimuiatea x^rSJVlUs 




MzozZj 


bx vivo rrlA-stimulatea PBMCs 


DPT IT 1 


VOI 1 1 c 
ZiZ J 1 1 J 


bx vivo rHA-stimulatea rBMLs 


"D A f~l 1 

BALjI 


V3 A 1 

ZjDD^y 1 


bx vivo rriA-stimulatea rBMCs 


DA to 

rAlz 


YUUoiU 


bx vivo PHA-stimulated PBMCs 


riruD 


VOI /I /CA 

Xoz4oU 


bx vivo PHA-stimulated PBMCs 




VI O/lf 1 

XI 2451 


bx vivo PHA-stimulated PBMCs 


TT /Z 


VA /* /n A 

X04430 


Ex vivo PriA-stimulated PBMCs 


1 UBB 


X79535 


Ex vivo PH A- stimulated PBMCs 


SCYA7 


X72308 


Ex vivo PHA-stimulated PBMCs 


DRD2 


X51362 


Ex vivo PHA-stimulated PBMCs 


SCYA2 


M26683 


Ex vivo PHA-stimulated PBMCs 


FABP5 


M94856 


Ex vivo PHA-stimulated PBMCs / 
RCC Tumor Tissue 


SCYA20 


U64197 


Ex vivo PHA-stimulated PBMCs / 
RCC Tumor Tissue 


ADM 


D 14874 


Ex vivo PHA-stimulated PBMCs / 
RCC Tumor Tissue / Renal Failure 
PBMCs 


COrbB 


Ah 001461 


Ex vivo PHA-stimulated PBMCs / 
RCC Tumor Tissue / Renal Failure 
PBMCs 


AQP9 


AB008775 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


PTGS2 


U04636 


bx vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


STIP1 


M86752 


bx vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


SOD2 


X07834 


bx vivo rrlA-stimulatea rBMLs / 
Renal Failure PBMCs 


PDXK 


U89606 


-C/A VI VO r n/\-blimUld.LCU Jr JjlVll^S / 

Renal Failure PBMCs 


IL1RN 


X52015 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


ANXA5 


U05770 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 
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RCC Disease 
Gene 


Entrez 
Accession No. 


Differentially Expressed in: 
(compared to disease-free PBMCs) 


IFIT4 


AF026939 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


IL1B 


M15330 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


GROl 


X54489 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


PLAUR 


X74039 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


NP 


X00737 


Ex vivo PHA-stimulated PBMCs / 
Renal Failure PBMCs 


FCGR3B 


XI 6863 


RCC Tumor Tissue 


UNK_M62896 


M62896 


RCC Tumor Tissue 


FN1 


X02761 


RCC Tumor Tissue 


HMOX1 


Z82244 


RCC Tumor Tissue 


ITGA7 


AF032108 


RCC Tumor Tissue ' 


DGCR5 


X91348 


RCC Tumor Tissue 






DPP Tirnirtr Tiopno 

i\A_^\_ i umor i issue 


UNK_AL049250 


AL049250 


RCC Tumor Tissue 


SLC1A4 


AA978353 


RCC Tumor Tissue 


MMP9 


J05070 


RCC Tumor Tissue / Renal Failure 
PBMCs 


SLC16A3 


U81800 


RCC Tumor Tissue / Renal Failure 
PBMCs 


LILRB3 


AF025533 


RCC Tumor Tissue / Renal Failure 
PBMCs 


FCGR1A 


M63835 


RCC Tumor Tissue / Renal Failure 
PBMCs 


LHFPL2 


D86961 


RCC Tumor Tissue / Renal Failure 
PBMCs 


PLEC1 


U53204 


RCC Tumor Tissue / Renal Failure 
PBMCs 


S100A11 


D38583 


RCC Tumor Tissue / Renal Failure 
PBMCs 


SPOP 


AJ000644 


RCC Tumor Tissue / Renal Failure 
PBMCs 


CCR1 


D10925 


RCC Tumor Tissue / Renal Failure 
PBMCs 


TLR2 


AF051152 


RCC Tumor Tissue / Renal Failure 
PBMCs 


KIAA0750 


ABO 18293 


RCC Tumor Tissue / Renal Failure 
PBMCs 
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RCC Disease 
Gene 


Entrez 
Accession No. 


Differentially Expressed in: 
(compared to disease-free PBMCs) 


CJJC34 


L22005 


Renal Failure PBMCs 


POLR2J 


L37127 


Renal Failure PBMCs 


ETS2 


J04102 


Renal Failure PBMCs 


MAD 


L06895 


Renal Failure PBMCs 


GPR3 


L32831 


Renal Failure PBMCs 


PIP5K1C 


AB011161 


Renal Failure PBMCs 


PRF1 


M28393 


Renal Failure PBMCs 


PSMA7 


AF054185 


Renal Failure PBMCs 


INPP4A 


U96919 


Renal Failure PBMCs 


TCFL1 


D43642 


Renal Failure PBMCs 


DGAT 


AF059202 


Renal Failure PBMCs 


S100P 


AA131149 


Renal Failure PBMCs 


DOC-1R 


AF089814 


Renal Failure PBMCs 


C8FW 


AJ000480 


Renal Failure PBMCs 


PDI2 


AB023211 


Renal Failure PBMCs 


GEF-2 


AI565760 


Renal Failure PBMCs 


TNNT1 


M19309 


Renal Failure PBMCs 


BSG 


X64364 


Renal Failure PBMCs 


IL17R 


U58917 


Renal Failure PBMCs 


HK3 


U51333 


Renal Failure PBMCs 


RALBP1 


L42542 


Renal Failure PBMCs 


T"» "VTA f \ *n ^> 

RNASE2 


X55988 


Renal Failure PBMCs 


TPM1 


Ml 9267 


Renal Failure PBMCs 


BLVRB 


D32143 


Renal Failure PBMCs 


a "no 

APS 


AB00O520 


Renal Failure PBMCs 


"DP ADn 

rrPu\U 




Kenai ranure rrJJviCs 


NFE2 


S77763 


Renal Failure PBMCs 


IL1RAP 


AB006537 


Renal Failure PBMCs 


ETS2 


AF017257 


Renal Failure PBMCs 
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RCC Disease 
Gene 


Entrez 
Accession No. 


Differentially Expressed in: 
(compared to disease- tree PBMCs) 


S100A12 


D83664 


Renal Failure PBMCs 

XX.XS.lJ.fcU. X IXJ. J. V*X X/ X l^lflV/u 


CD9 


M38690 


Renal Failure PBMCs 


FNTGMA 


L3S240 


Renal Failure PBMCs 


HACtH 

IXriVJI 1 


X909QQ 


Renal Failure PBMCs 






Renal Failure PBMCs 

IVG1J.C11 X all LUC I LJlVlV^o 


FT OT1 

JT l^t\J 1 1 




Ppnal Failure PBMC«2 
ACiioi x allium x iJiviv/o 


TTGA2B 


M34480 


Renal Failure PBMCs 


FKBP8 


L37033 


Renal Failure PBMCs 


DUSP6 


AB013382 


Renal Failure PBMCs 


CBFA2T3 


AB010419 


Renal Failure PBMCs 



C. Other Solid Tumor Disease Genes 

[0488] The methodologies described in subsection B can be easily adapted to the 

identification of other solid tumor disease genes. These solid tumor disease genes are 
differentially expressed in the peripheral blood or PBMCs of solid tumor patients compared 
to disease-free humans. 

[0489] In one embodiment, the genechip expression data derived from PBMC- 

enriched peripheral blood samples of RCC, prostate cancer, head/neck cancer and disease- 
free humans is collected, compared and analyzed using a multi-class correlation metric. 
The multi-class correlation metric can identify and rank the genes mostly highly correlated 
with each class of the peripheral blood gene expression profiles. Suitable multi-class 
correlation metrics include, but are not limited to, the GeneCluster 2 software provided by 
MIT Center for Genome Research at Whitehead Institute (Cambridge, MA). The 
GeneCluster 2 software has supervised classification, gene selection and permutation test 
functions. It includes algorithms for building and testing supervised models using weighted 
voting and k-nearest neighbors algorithms. 

[0490] In one example, a 20-gene set is selected using 70% of the expression 

profiles as a training set. These 20 multi-class classifier genes are listed in Table 10. Each 
of these 20 genes has a differential expression pattern in the peripheral blood of all three 
classes of solid tumor patients (i.e., RCC, prostate cancer and head/neck cancer) as 
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compared to disease-free humans. The gene set has over 89% prediction accuracy for each 
remaining profile. Other gene sets with high predication accuracy for RCC, prostate cancer, 
head/neck cancer and disease-free can be similarly obtained. 

[0491] In another embodiment, a multi-class correlation metric is used to identify 

genes capable of predicting solid tumor versus solid tumor-free, regardless of the particular 
type of the solid tumor. The peripheral blood gene expression profiles from RCC, prostate 
cancer, head/neck cancer, and disease-free humans are analyzed using multi-class 
comparison. A 19-gene set is selected using 70% of the total samples as a training set. The 
gene set thus selected is listed in Table 11. Each gene in the gene set is differentially 
expressed in the peripheral blood of all three types of solid tumor patients (RCC, prostate 
cancer, and head/neck cancer) as compared to disease-free humans. This 19-gene set is 
capable of accurately predicting solid tumor versus solid tumor-free for the remaining 
expression profiles. Other gene sets with high prediction accuracy for solid tumor versus 
solid tumor- free can be similarly obtained. 

D. Detecting RCC. RCC-Free. Solid Tumor and/or Solid Tumor-Free 

[0492] The RCC disease genes identified in Table 4 can be used to detect RCC, 

RCC-free, solid tumors, and/or solid tumor-free in a human subject with unknown disease 
status. For instance, if the expression patterns of one or more RCC disease genes in the 
peripheral blood sample of the human subject are not substantially different from the 
corresponding expression patterns in disease-free humans, then it is suggestive that the 
human subject under diagnosis is RCC-free. Conversely, if the expression patterns of one 
or more RCC disease genes in the human subject are substantially different from the 
corresponding expression patterns in disease-free humans (e.g., gene expression levels in 
the human subject are substantially higher or lower than those in disease-free humans), then 
it is suggestive that the human subject may be infected with RCC (or other solid tumors, 
depending on the genes used in the diagnosis). Algorithms, such as the weighted voting 
programs, can be used to facilitate the diagnosis. In addition, other clinical evidence can be 
combined with the gene-based test to reduce the risk of false diagnosis. Similar approaches 
can be used to predict the presence or absence of other solid tumors such as prostate cancer 
and head/neck caner. 



W288761.2 



143 



[0493] Diagnostic or screening methods based on differentially expressed gene 

products are well known in the art. In accordance with one aspect of the present invention, 
the differential expression patterns of RCC disease genes can be determined by measuring 
the levels of RNA transcripts of these genes in peripheral blood samples. Suitable methods 
for this purpose include, but are not limited to, RT-PCT, Northern Blot, in situ 
hybridization, Southern Blot, slot-blotting, nuclease protection assay and polynucleotide 
arrays. The peripheral blood samples can be either whole blood, or blood samples 
containing enriched PBMCs. 

[0494] In general, RNA isolated from peripheral blood samples can be amplified to 

cDNA or cRNA before detection and/or quantitation. The isolated RNA can be either total 
RNA or mRNA. The RNA amplification can be specific or non-specific. Suitable 
amplification methods include, but are not limited to, reverse transcriptase PCR, isothermal 
amplification, ligase chain reaction, and Qbeta replicase. The amplified nucleic acid 
products can be detected and/or quantitated through hybridization to labeled probes. 
[0495] Amplification primers or hybridization probes for an RCC disease gene can 

be prepared from the gene sequence or its corresponding CPS using methods well known in 
the art. Gene sequences suitable for this purpose include, but are not limited to, exons, 
introns, or the 3' or 5' untranslated regions, or any combination thereof. In one 
embodiment, the probes or primers are designed based on the sequence in or near the 3 * 
protein-coding region of the RCC disease gene. For instance, the nucleotide sequence 
encoding the last 100 to 300 amino acid residues in the C-terminus region of the RCC 
disease gene product can be selected to design probes or primers. In the case that the 
genomic location(s) of the RCC disease gene has not been determined or that the gene may 
correspond to multiple genomic loci, the probes/primers can be designed based on the CPS 
of the gene, or the oligonucleotide probes on the HG-U95Av2 gene chip that was used for 
the identification of the gene. 

[0496] Table 4 lists sequences suitable for making probes/primers for the detection 

of their corresponding RCC disease genes. Examples of suitable oligonucleotide 
probes/primers are listed in ATTACHMENT A. 

[0497] In one embodiment, each probe/primer comprises at least 15 nucleotides. 

For instance, each probe can comprise at least 20, 25, 50, 75, 100, 125, 150, 175, 200, 225, 
250, 275, 300, 325, 350, 400 or more nucleotides. Preferably, each probe/primer has 
relatively high sequence complexity and does not have any ambiguous residue 
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(undetermined "n" residues). The probes/primers preferably can hybridize to the target 
gene, including its RNA transcripts, under stringent or highly stringent conditions. 
[0498] In another embodiment, the probes/primers for a gene are selected from 

regions which significantly diverge from the sequences of other genes. Such regions can be 
determined by checking the probe/primer sequences against a human genome sequence 
database, such as the Entrez database at the NCBI. One algorithm suitable for this purpose 
is the BLAST algorithm. This algorithm involves first identifying high scoring sequence 
pairs (HSPs) by identifying short words of length W in the query sequence, which either 
match or satisfy some positive-valued threshold score T when aligned with a word of the 
same length in a database sequence. T is referred to as the neighborhood word score 
threshold. These initial neighborhood word hits act as seeds for initiating searches to find 
longer HSPs containing them. The word hits are then extended in both directions along 
each sequence to increase the cumulative alignment score. Cumulative scores are calculated 
using, for nucleotide sequences, the parameters M (reward score for a pair of matching 
residues; always >0) and N (penalty score for mismatching residues; always <0). The 
BLAST algorithm parameters W, T, and X determine the sensitivity and speed of the 
alignment. These parameters can be adjusted for different purposes, as appreciated by one 
of ordinary skill in the art. 

[0499] In a preferred embodiment, quantitative RT-PCR (such as TaqMan, ABI) is 

used for detecting and comparing the levels of RNA transcripts of the RCC disease genes in 
peripheral blood samples. Quantitative RT-PCR involves reverse transcription (RT) of 
RNA to cDNA followed by relative quantitative PCR (RT-PCR). 

[0500] In PCR, the number of molecules of the amplified target DNA increases by a 

factor approaching two with every cycle of the reaction until some reagent becomes 
limiting. Thereafter, the rate of amplification becomes increasingly diminished until there 
is not an increase in the amplified target between cycles. If one plots a graph on which the 
cycle number is on the X axis and the log of the concentration of the amplified target DNA 
is on the Y axis, one observes that a curved line of characteristic shape is formed by 
connecting the plotted points. Beginning with the first cycle, the slope of the line is positive 
and constant. This is said to be the linear portion of the curve. After some reagent becomes 
limiting, the slope of the line begins to decrease and eventually becomes zero. At this point 
the concentration of the amplified target DNA becomes asymptotic to some fixed value. 
This is said to be the plateau portion of the curve. 
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[0501] The concentration of the target DNA in the linear portion of the PCR is 

proportional to the starting concentration of the target before the PCR was begun. By 
determining the concentration of the PCR products of the target DNA in PCR reactions that 
have completed the same number of cycles and are in their linear ranges, it is possible to 
determine the relative concentrations of the specific target sequence in the original DNA 
mixture. If the DNA mixtures are cDNAs synthesized from RNAs isolated from different 
tissues or cells, the relative abundances of the specific mRNA from which the target 
sequence was derived may be determined for the respective tissues or cells. This direct 
proportionality between the concentration of the PCR products and the relative mRNA 
abundances is true in the linear range portion of the PCR reaction. 

[0502] The final concentration of the target DNA in the plateau portion of the curve 

is determined by the availability of reagents in the reaction mix and is independent of the 
original concentration of target DNA. Therefore, the sampling and quantifying of the 
amplified PCR products preferably are carried out when the PCR reactions are in the linear 
portion of their curves. In addition, relative concentrations of the amplifiable cDNAs 
preferably are normalized to some independent standard, which may be based on either 
internally existing RNA species or externally introduced RNA species. The abundance of a 
particular mRNA species may also be determined relative to the average abundance of all 
mRNA species in the sample. 

[0503] In one embodiment, the PCR amplification utilizes internal PCR standards 

that are approximately as abundant as the target. This strategy is effective if the products of 
the PCR amplifications are sampled during their linear phases. If the products are sampled 
when the reactions are approaching the plateau phase, then the less abundant product may 
become relatively over-represented. Comparisons of relative abundances made for many 
different RNA samples, such as is the case when examining RNA samples for differential 
expression, may become distorted in such a way as to make differences in relative 
abundances of RNAs appear less than they actually are. This can be improved if the 
internal standard is much more abundant than the target. If the internal standard is more 
abundant than the target, then direct linear comparisons may be made between RNA 
samples. 

[0504] A problem inherent in clinical samples is that they are of variable quantity 

and/or quality. This problem can be overcome if the RT-PCR is performed as a relative 
quantitative RT-PCR with an internal standard in which the internal standard is an 
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amplifiable cDNA fragment that is larger than the target cDNA fragment and in which the 
abundance of the mRNA encoding the internal standard is roughly 5-100 fold higher than 
the mRNA encoding the target. This assay measures relative abundance, not absolute 
abundance of the respective mRNA species. 

[0505] In another embodiment, the relative quantitative RT-PCR uses an external 

standard protocol. Under this protocol, the PCR products are sampled in the linear portion 
of their amplification curves. The number of PCR cycles that are optimal for sampling can 
be empirically determined for each target cDNA fragment. In addition, the reverse 
transcriptase products of each RNA population isolated from the various samples can be 
normalized for equal concentrations of amplifiable cDNAs. While empirical determination 
of the linear range of the amplification curve and normalization of cDNA preparations are 
tedious and time-consuming processes, the resulting RT-PCR assays may, in certain cases, 
be superior to those derived from a relative quantitative RT-PCR with an internal standard. 
[0506] Nucleic acid arrays can also be used to detect and compare the differential 

expression patterns of RCC disease genes in peripheral blood samples. The probes suitable 
for detecting the corresponding RCC disease genes can be stably attached to known discrete 
regions on a solid substrate. As used herein, a probe is "stably attached" to a discrete region 
if the probe maintains its position relative to the discrete region during the hybridization and 
the subsequent washes. Construction of nucleic acid arrays is well known in the art. 
Suitable substrates for making polynucleotide arrays include, but are not limited to, 
membranes, films, plastics and quartz wafers. 

[0507] A nucleic acid array of the present invention can comprise at least 2, 5, 10, 

15, 20, 25, 30, 35, 40, 45, 50, 60, 70, 80, 90, 100, 150, 200, 250 or more different 
polynucleotide probes, each different probe capable of hybridizing to a different respective 
RCC disease gene. Multiple probes for the same gene can be used on a single nucleic acid 
array. Examples of probes suitable for this invention are listed in ATTACHMENT A. 
Probes for other disease genes can also be included in the nucleic acid array of this 
invention. The probe density on the array can be in any range. For instance, the density 
may be 50, 1 00, 200, 300, 400, 500 or more probes/cm 2 . 

[0508] In one embodiment, nuclease protection assays are used to quantify RNAs 

derived from the peripheral blood samples. There are many different versions of nuclease 
protection assays known to those practiced in the art. The common characteristic that these 
nuclease protection assays is that they involve hybridization of an antisense nucleic acid 
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with the RNA to be quantified. The resulting hybrid double-stranded molecule is then 
digested with a nuclease that digests single-stranded nucleic acids more efficiently than 
double-stranded molecules. The amount of antisense nucleic acid that survives digestion is 
a measure of the amount of the target RNA species to be quantified. An example of a 
nuclease protection assay that is commercially available is the RNase protection assay 
manufactured by Ambion, Inc. (Austin, Texas). 

[0509] The above-described methods can also be used to determine the levels of 

RNA species in the peripheral blood that are capable of hybridizing to the CPSs listed in 
CPS-Table-2. The levels of these RNA species in the peripheral blood can be indicative as 
to whether a human subject has RCC or is RCC-free. 

[0510] In accordance with another aspect of the present invention, the differential 

expression patterns of RCC disease genes can be determined by measuring the levels of 
polypeptides encoded by these genes in peripheral blood. Methods suitable for this purpose 
include, but are not limited to, immunoassays such as ELISA, RIA, FACS, dot blot, 
Western Blot, immunohistochemistry, and antibody-based radioimaging. Protocols for 
carrying out these immunoassays are well known in the art. Other methods such as 2- 
dimensional SDS-polyacrylamide gel electrophoresis can also be used. 

[0511] One exemplary method suitable for detecting the levels of target proteins in 

peripheral blood samples is ELISA. In an exemplifying ELISA, antibodies capable of 
binding to the target proteins encoded by one or more RCC disease genes are immobilized 
onto a selected surface exhibiting protein affinity, such as wells in a polystyrene or 
polyvinyl chloride microtiter plate. Then, peripheral blood samples to be tested are added to 
the wells. After binding and washing to remove non-specifically bound immunocomplexes, 
the bound antigen(s) can be detected. Detection can be achieved by the addition of a second 
antibody which is specific for the target proteins and is linked to a detectable label. 
Detection may also be achieved by the addition of a second antibody, followed by the 
addition of a third antibody that has binding affinity for the second antibody, with the third 
antibody being linked to a detectable label. Before being added to the microtiter plate, cells 
in the peripheral blood samples can be lysed using various methods known in the art. 
Proper extraction procedures can be used to separate the target proteins from potentially 
interfering substances. 

[0512] In another exemplifying ELISA, the peripheral blood samples suspected of 

containing the target proteins are immobilized onto the well surface and then contacted with 
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the antibodies of the invention. After binding and washing to remove non-specifically 
bound immunocomplexes, the bound antigen is detected. Where the initial antibodies are 
linked to a detectable label, the immunocomplexes can be detected directly. The 
immunocomplexes can also be detected using a second antibody that has binding affinity for 
the first antibody, with the second antibody being linked to a detectable label. 
[0513] Another exemplary ELISA involves the use of antibody competition in the 

detection. In this ELISA, the target proteins are immobilized on the well surface. The 
labeled antibodies are added to the well, allowed to bind to the target proteins, and detected 
by means of their labels. The amount of the target proteins in an unknown sample is then 
determined by mixing the sample with the labeled antibodies before or during incubation 
with coated wells. The presence of the target proteins in the unknown sample acts to reduce 
the amount of antibody available for binding to the well and thus reduces the ultimate 
signal. 

[0514] Different ELISA formats can have certain features in common, such as 

coating, incubating or binding, washing to remove non-specifically bound species, and 
detecting the bound immunocomplexes. For instance, in coating a plate with either antigen 
or antibody, the wells of the plate can be incubated with a solution of the antigen or 
antibody, either overnight or for a specified period of hours. The wells of the plate are then 
washed to remove incompletely adsorbed material. Any remaining available surfaces of the 
wells are then "coated" with a nonspecific protein that is antigenically neutral with regard to 
the test samples. Examples of these nonspecific proteins include bovine serum albumin 
(BSA), casein and solutions of milk powder. The coating allows for blocking of 
nonspecific adsorption sites on the immobilizing surface and thus reduces the background 
caused by nonspecific binding of antisera onto the surface. 

[0515] In ELISAs, a secondary or tertiary detection means can also be used. After 

binding of a protein or antibody to the well, coating with a non-reactive material to reduce 
background, and washing to remove unbound material, the immobilizing surface is 
contacted with the control and/or clinical or biological sample to be tested under conditions 
effective to allow immunocomplex (antigen/antibody) formation. These conditions may 
include, for example, diluting the antigens and antibodies with solutions such as BSA, 
bovine gamma globulin (BGG) and phosphate buffered saline (PBS)/Tween and incubating 
the antibodies and antigens at room temperature for about 1 to 4 hours or at 4°C overnight. 
Detection of the immunocomplex then requires a labeled secondary binding ligand or 
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antibody, or a secondary binding ligand or antibody in conjunction with a labeled tertiary 
antibody or third binding ligand. 

[0516] Following all incubation steps in an ELISA, the contacted surface can be 

washed so as to remove non-complexed material. For instance, the surface may be washed 
with a solution such as PBS/Tween, or borate buffer. Following the formation of specific 
immuno compl exes between the test sample and the originally bound material, and 
subsequent washing, the occurrence of the amount of immunocomplexes can be determined. 
[0517] To provide a detecting means, the second or third antibody can have an 

associated label to allow detection. In one embodiment, the label is an enzyme that 
generates color development upon incubating with an appropriate chromogenic substrate. 
Thus, for example, one may contact and incubate the first or second immunocomplex with a 
urease, glucose oxidase, alkaline phosphatase or hydrogen peroxidase-conjugated antibody 
for a period of time and under conditions that favor the development of further 
immunocomplex formation (e.g., incubation for 2 hours at room temperature in a PBS- 
containing solution such as PBS-Tween). 

[0518] After incubation with the labeled antibody, and subsequent to washing to 

remove unbound material, the amount of label is quantified, e.g., by incubation with a 
chromogenic substrate such as urea and bromocresol purple or 2,2'-azido-di-(3-ethyl)- 
benzthiazoline-6-sulfonic acid (ABTS) and H2O2, in the case of peroxidase as the enzyme 
label. Quantitation can be achieved by measuring the degree of color generation, e.g., using 
a spectrophotometer. 

[0519] Another method suitable this invention is RIA (radioimmunoassay). An 

exemplary RIA is based on the competition between radiolabeled-polypeptides and 
unlabeled polypeptides for binding to a limited quantity of antibodies. Suitable radiolabels 
include, but are not limited to, I 125 . In one embodiment, a fixed concentration of I 125 -labeled 
polypeptide is incubated with a series of dilution of an antibody specific to the polypeptide. 
When the unlabeled polypeptide is added to the system, the amount of the I* 25 -polypeptide 
that binds to the antibody is decreased. A standard curve can therefore be constructed to 
represent the amount of antibody-bound 1 125 -polypeptide as a function of the concentration 
of the unlabeled polypeptide. From this standard curve, the concentration of the 
polypeptide in unknown samples can be determined. Various protocols for conducting RIA 
to measure the levels of polypeptides in peripheral blood samples are well known in the art. 
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[0520] Suitable antibodies for this invention include, but are not limited to, 

polyclonal antibodies, monoclonal antibodies, chimeric antibodies, humanized antibodies, 
single chain antibodies, Fab fragments, and fragments produced by a Fab expression library. 
Neutralizing antibodies (i.e., those which inhibit dimer formation) can also be used. 
[0521] Polyclonal antibodies can be prepared by immunizing a suitable subject with 

RCC disease gene products or fragments thereof. The antibody titer in the immunized 
subject can be monitored over the time using standard techniques, such as ELISA. The 
antibodies can be isolated from the immunized subject using techniques well known in the 
art. 

[0522] In one embodiment, hybridomas capable of producing antibodies against 

RCC disease gene products are prepared. RCC disease gene products can be prepared using 
bacteria or other cells transformed or transfected with the polynucleotide sequences 
encoding the gene products. The purified gene products, or fragments thereof, are used to 
immunize a vertebrate, such as a mammal. Suitable mammals include mice, rabbits and 
sheep. Preferably, the fragment used for immunization comprises at least 8 amino acid 
residues, more preferably at least 12 amino acid residues, highly preferably at least 16 
amino acid residues, and most preferably at least 20 amino acid residues. 
[0523] Immunogenic fragments (epitopes) in the gene products can be identified 

using known techniques. Preferred epitopes are regions that are located on the surfaces of 
the gene products. These regions are usually hydrophilic. 

[0524] Splenocytes are isolated from the immunized vertebrate and fused with an 

immortalized cell line (such as a myeloma) to form hybridomas. Preferably, the immortal 
cell line is derived from the same mammalian species as the lymphocytes. For example, 
murine hybridomas can be made by fusing an immortalized mouse cell line with 
lymphocytes isolated from a mouse that is immunized with an immunogenic preparation of 
the present invention. Preferred immortalized cell lines include mouse myeloma cell lines 
that are sensitive to culture medium containing hypoxanthine, aminopterin and thymidine 
("HAT medium"). Suitable myeloma cell lines include, but are not limited to, the 
P3-NSl/l-Ag4-l, P3-x63-Ag8.653 or Sp210-Agl4 myeloma lines, all of which are available 
from ATCC. In one embodiment, HAT-sensitive mouse myeloma cells are fused to mouse 
splenocytes using polyethylene glycol ("PEG"). Hybridoma cells thus produced are 
selected against HAT medium, which kills unfused or unproductively fused myeloma cells. 
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Hybridoma cells which produce monoclonal antibodies against the RCC disease gene 
products can be detected by screening the hybridoma culture supernatants. 
[0525] Monoclonal antibodies can also be prepared by screening a recombinant 

combinatorial immunoglobulin library (e.g., an antibody phase display library). Kits for 
generating and screening phage display libraries are commercially available (e.g., the 
Pharmacia Recombinant Phage Antibody System, Catalog No. 27-9400-01; and the 
Stratagene SurfZAP™ Phage Display Kit, Catalog No. 240612). 

[0526] The antibodies suitable for this invention also include "single-chain Fv" or 

"scFv." The scFv fragments comprise the V H and V L domains of an antibody. Generally, 
the scFv polypeptide further comprises a polypeptide linker between the V H and V L 
domains. The polypeptide linker enables the scFv to form the desired structure for antigen 
binding. Additionally, recombinant antibodies, such as chimeric and humanized 
monoclonal antibodies, can be prepared, as appreciated by one of ordinary skill in the art. 
[0527] Humanized antibodies can also be used. Humanized forms of non-human 

(e.g., murine) antibodies are chimeric immunoglobulins, immunoglobulin chains, or 
fragments thereof (such as Fv, Fab, Fab', F(ab , ) 2 or other antigen-binding subsequences of 
antibodies) which contain minimal sequence derived from non-human immunoglobulin. 
Humanized antibodies are derived from human immunoglobulins in which the residues 
forming the complementary determining regions (CDRs) are replaced by the residues from 
CDRs of a non-human antibody, such as a mouse, rat or rabbit antibody having the desired 
specificity, affinity and capacity. In some instances, Fv framework residues of the human 
immunoglobulin are replaced by corresponding non-human residues. Humanized 
antibodies may also comprise residues which are found neither in the recipient antibody nor 
in the imported CDR or framework sequences. The humanized antibody can comprise at 
least one or two variable domains, in which all or substantially all of the CDR regions 
correspond to those of a non-human immunoglobulin and all or substantially all of the 
constant regions are those of a human immunoglobulin consensus sequence. The 
humanized antibody preferably comprises at least a portion of an immunoglobulin constant 
region (Fc) of a human immunoglobulin. 

[0528] Humanized antibodies can be produced using transgenic mice which are 

incapable of expressing endogenous immunoglobulin heavy and light chains but can express 
human heavy and light chains. The transgenic mice are immunized in the normal fashion 
with a selected antigen. Monoclonal antibodies directed against the antigen can be obtained 
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using conventional hybridoma technology. The human immunoglobulin transgenes 
harbored in the transgenic mice rearrange during B cell differentiation, and subsequently 
undergo class switching and somatic mutation. Using this technique, therapeutically useful 
IgG, IgA and IgE antibodies can be prepared. 

[0529J In addition, humanized antibodies which recognize a selected epitope can be 

generated using a technique referred to as "guided selection." In this approach a selected 
non-human monoclonal antibody, e.g., a murine antibody, is used to guide the selection of a 
humanized antibody recognizing the same epitope. 

[0530] In one embodiment, the antibodies of the present invention can bind to the 

corresponding RCC disease gene products or the desired antigens with a binding affinity 
constant Ka of at least 10 4 NT 1 , such as at least 10 5 M" 1 , 10 6 M" 1 , 10 7 M" 1 or more. 
[0531] The antibodies of this invention can be labeled with one or more detectable 

moieties to allow for detection of antibody-antigen complexes. The detectable moieties can 
include compositions detectable by spectroscopic, enzymatic, photochemical, biochemical, 
bioelectronic, immunochemical, electrical, optical or chemical means. The detectable 
moieties include, but are not limited to, radioisotopes, chemiluminescent compounds, 
labeled binding proteins, heavy metal atoms, spectroscopic markers such as fluorescent 
markers and dyes, magnetic labels, linked enzymes, mass spectrometry tags, spin labels, 
electron transfer donors and acceptors, and the like. 

[0532] In accordance with yet another aspect of the present invention, the levels of 

polypeptides in peripheral blood samples can be determined by detecting the biological 
activities associated with the polypeptides. If a biological function/activity of a polypeptide 
is known, suitable in vitro bioassays can be designed to evaluate the biological 
function/activity, thereby determining the amount of the polypeptide in the sample. 
[0533] The expression levels of RCC disease genes or the respective CPSs can be 

compared to the reference expression levels using various methods. These reference levels 
can be determined using peripheral blood samples isolated from disease-free humans, RCC 
or other solid tumor patients. The comparison can be performed using the fold change or 
the absolute difference between the expression levels to be compared. One or more RCC 
disease genes or CPSs can be used in the comparison. For instance, at least 2, 3, 4, 6, 8, 1 0, 
12, 14, 16, 18, 20, 30, 40, 50, 60, 70, 80, 90, 100, 150, 200 or more RCC disease genes or 
CPSs can be used. 
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[0534] The expression patterns can also be compared by using one or more ratios 

between the expression levels of different disease genes. Other suitable measures or 
indicators can also be employed for assessing the relationship or difference between 
different expression patterns. 

[0535] The use of multiple CPSs or RCC disease genes can reduce the risk of false 

prediction. In one embodiment, if more than 50% (such as 60%, 70%, 80% or 90%) of the 
selected CPSs or RCC disease genes suggest that the test human has RCC or is RCGfree, 
then a prediction for RCC or RCC-free will be made respectively. In another embodiment, 
the gene expression-based comparison is combined with other clinical evidence in 
predicting RCC and/or other solid tumors. 

[0536] In a preferred embodiment, the RCC disease genes used for predicting RCC 

versus RCC-free include or consist of one or more genes selected from the group consisting 
of EEF1A2, TLR2, BRF2, LGALS3, SNRPG, DKFZP586E1621, NUMA1, SOD2, 
AKR1B1, DUSP6, SMARCE1, KIAA0669, MSF, IL1RN, PTMA, KIAA0410, PSMD3, 
T54, C1QBP, and OSR1 . For instance, the RCC disease genes used for RCC prediction can 
include or consist of at least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes selected from the 
group. For another instance, the RCC disease genes used for diagnosis can comprise (1) at 
least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 genes selected from the group consisting of TLR2, 
LGALS3, DKFZP586E1621, SOD2, DUSP6, KIAA0669, IL1RN, KIAA0410, T54 and 
OSR1, and/or (2) at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 genes selected from the group 
consisting of EEF1A2, BRF2, SNRPG, NUMA1, AKR1B1, SMARCE1, MSF, PTMA, 
PSMD3 andClQBP. 

[0537] In another preferred embodiment, the CPSs used for predicting RCC versus 

RCC-free include or consist of one or more CPSs selected from the group consisting of CPS 
1, CPS 3, CPS 4, CPS 6, CPS 18, CPS 38, CPS 53, CPS 255, CPS256, CPS 257, CPS 258, 
CPS 259, CPS 260, CPS 261, CPS 262, CPS 263, CPS 264, CPS 265, CPS 266, and CPS 
267. For instance, the CPSs used for RCC prediction can include or consist of at least 2, 4, 
6, 8, 10, 12, 14, 16, 18 or 20 CPSs selected from the group. For another instance, the CPSs 
used for diagnosis can comprise (1) at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 CPSs selected from 
the group consisting of CPS 1, CPS 3, CPS 4, CPS 6, CPS 18, CPS 38, CPS 53, CPS 261, 
CPS 264 and CPS 267, and/or (2) at least 1, 2, 3, 4, 5, 6, 7, 8, 9 or 10 CPSs selected from 
the group consisting of CPS 255, CPS 256, CPS 257, CPS 258, CPS 259, CPS 260, CPS 
262, CPS 263, CPS 265, and CPS 266. 
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[0538] In yet another preferred embodiment, the RCC disease genes used for 

predicting RCC versus RCC-free include or consist of one or more genes selected from the 
group consisting of CD44, KIAA0410, MARCO, MAP3K8, NSP-CL, PIP5K1C, NRG1, 
RAB31, LGALS3, MEF2D, ITGA7, LHFPL2, ETS2, KHSRP, ENIGMA, 
UNK_AF038187, RAB13, TLR2, T54 and DUSP6. For instance, the RCC disease genes 
used for prediction can include or consist of at least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes 
selected from the group. 

[0539] In still another preferred embodiment, the CPSs used for predicting RCC 

versus RCC-free include or consist of one or more CPSs selected from the group consisting 
of CPSs 1, 3, 4, 5, 6, 7, 9, 10, 11, 16, 28, 31, 268, 264, 279, 280, 281, 282, 283 and 284. 
For instance, the CPSs used for prediction can include or consist of at least 4, 6, 8, 10, 12, 
14, 16, 18 or 20 CPSs selected from the group. 

[0540] In another preferred embodiment, the RCC disease genes used for predicting 

RCC and/or other solid tumors, such as prostate cancer and head/neck cancer, include or 
consist of one or more genes selected from the group consisting of CD44, CRADD, 
CCRL2, KIAA0837, KIAA0707, KIAA1113, EREG, UNK_AL0501 19, PPARD, CTSL, 
ATP2B1, UNK_AF052115, MITF, STAT3, KIAA0410, TPD52L2, UNK_AI732885, 
MARCO, LOC64116, and PDNP2. For instance, the RCC disease genes used for 
prediction can include or consist of at least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes selected 
from the group. 

[0541] In yet another preferred embodiment, the CPSs used for predicting RCC 

and/or other solid tumors, such as prostate cancer and head/neck cancer, include or consist 
of one or more CPSs selected from the group consisting of CPSs 17, 31, 37, 50, 59, 64, 69, 
71, 264, 268, 269, 270, 271, 272, 273, 274, 275, 276, 277 and 278. For instance, the CPSs 
used for prediction can include or consist of at least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 CPSs 
selected from the group. 

[0542] In one embodiment, the RCC disease genes used for predicting solid tumor 

versus solid tumor-free include or consist of one or more genes selected from the group 
consisting of NUMA1, CXCR4, IL10RA, M9, FAU, BRF2, RPS6, EEF1A2, BAGS, 
AKR1B1, UNK_AL022721, C1QBP, DKZP586E0820, NONO, PSMD3, UNK_N74607, 
UNK_AI743507, MAPKAPK5, and UNKJJ79297. For instance, the RCC disease genes 
used for prediction can include or consist of at least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes 
selected from the group. 
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[05431 In another embodiment, the CPSs used for predicting solid tumor versus 

solid tumor-free include or consist of one or more CPSs selected from the group consisting 
of CPSs 258, 285, 107, 286, 287, 256, 288, 255, 289, 259, 290, 266, 291, 292, 265, 131, 
293, 294 and 295. For instance, the CPSs used for prediction can include or consist of at 
least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 CPSs selected from the group. 

[0544] Comparison of the expression profiles can also be performed based on a 

quantitative hybridization of arrayed DNA clones, the serial analysis of gene expression 
(SAGE) technology, or electronic analysis, such as the Transcript Imaging tool or the 
GEMTOOLS gene expression analysis program (Incyte Pharmaceuticals) or the 
GeneCalling and Quantitative Expression Analysis technology (Curagen). Algorithms, such 
as pattern recognition programs, can be used to compare the expression profiles of RCC 
disease genes with reference expression profiles. 

E. RCC and Other Solid Tumor Prediction Based On Weighted Voting 
Algorithm 

[0545] In accordance with one aspect of this invention, a weighted voting algorithm 

is used for comparing the expression profiles of a set of RCC disease genes in the human 
under diagnosis, to the expression profiles of the same set of RCC disease genes in disease- 
free humans and known RCC or solid tumor patients. The weighted voting algorithm is 
described in T.R. Golub, et al., Science, 286: 531-537 (1999), and D.K. Slonim et al., 
Procs. of the Fourth Annual International Conference on Computational Molecular Biology, 
Tokyo, Japan, April 8-11, p263-272 (2000). The algorithm can involve two-class or multi- 
class analysis. Multi-class analysis software, such as GeneCluster 2 software, is available 
from MIT Center for Genome Research at Whitehead Institute. The algorithm is capable of 
assigning the human under diagnosis to one of at least two classes. 

[0546] Under one form of the algorithm, the human to be diagnosed is assigned to 

one of two classes (referred to as class 0 and class 1). For instance, class 0 may represent 
and consist of disease-free humans, and class 1 may represent and consist of RCC patients. 
A set of RCC disease genes are selected to create a class predictor (classifier). Each gene in 
the class predictor casts a weighted vote for one of the two classes (class 0 and class 1). 
The vote of gene "g" can be defined as v g = ag (x g -b g ), wherein ag = P(g,c) reflects the 
correlation between the expression level of gene "g" and the class distinction, bg = [x0(g) + 
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xl(g)]/2 is the average of the mean logs of the expression levels of gene "g" in class 0 and 
class 1, and x g represents the normalized log of the expression level of gene "g" in the test 
sample. A positive v g indicates a vote for class 0, and a negative v g indicates a vote for 
class 1. V0 denotes the sum of all positive votes, and VI denotes the absolute value of the 
sum of all negative votes. A prediction strength PS is defined as PS = (V0- V1)/(V0 + VI). 
[0547] Cross-validation can be used to evaluate the accuracy of the class predictor 

created under the weighted voting algorithm. Briefly, one sample which has been used to 
identify the RCC disease genes under the neighborhood analysis is withheld. A class 
predictor is created based on the rest samples, and then used to predict the class of the 
sample withheld. This process can be repeated for each sample that has been used in the 
neighborhood analysis. Class predictors comprising different RCC disease genes can be 
evaluated using the cross-validation process, and the best class predictor with the most 
accurate predication can be identified. In addition, a suitable prediction strength (PS) 
threshold can be assessed by plotting the cumulative cross-validation error rate against the 
prediction strength. 

[05481 In one embodiment, a positive predication that a test sample belongs to class 

0 or class 1 can be made if the absolute value of PS for the test sample is no less than 0.3. 
Other PS threshold, such as no less than 0.1 or 0.2, can also be used. 

[0549] In another embodiment, the class predictor or classifier consists of n RCC 

disease genes identified under the neighborhood analysis. A half of these RCC disease 
genes has the largest P(g,c) scores, and the other half has the largest -P(g,c) scores. The 
number n is the only free parameter in defining the class predictor. 

[0550] Subsection G of this specification depicts detailed examples of building and 

training the RCC disease classifiers. 

[0551] In a preferred embodiment, the class predictor comprises or consists of at 

least 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes selected from EEF1A2, TLR2, BRF2, 
LGALS3, SNRPG, DKFZP586E1621, NUMA1, SOD2, AKR1B1, DUSP6, SMARCE1, 
KIAA0669, MSF, IL1RN, PTMA, KIAA0410, PSMD3, T54, C1QBP, and OSR1. For 
instance, a 2-gene class predictor can consist of TLR2 and EEF1A2. A 4-gene class 
predictor can consist of TLR2, LGALS3, EEF1A2, and BRF2. A 6-gene class predictor can 
consist of TLR2, LGALS3, DKFZP586E1621, EEF1A2, BRF2, and SNRPG. An 8-gene 
class predictor can consist of TLR2, LGALS3, DKFZP586E1621, SOD2, EEF1A2, BRF2, 
SNRPG, and NUMA1. A 10-gene class predictor can consist of TLR2, LGALS3, 
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DKFZP586E1621, S0D2, DUSP6, EEF1A2, BRF2, SNRPG, NUMA1, and AKR1B1. A 
12-gene class predictor can consist of TLR2, LGALS3, DKFZP586E1621, SOD2, DUSP6, 
KIAA0669, EEF1A2, BRF2, SNRPG, NUMA1, AKR1B1, and SMARCE1. A 14-gene 
class predictor can consist of TLR2, LGALS3, DKFZP586E1621, SOD2, DUSP6, 
KIAA0669, IL1RN, EEF1A2, BRF2, SNRPG, NUMA1, AKR1B1, SMARCE1, and MSF. 
A 16-gene class predictor can consist of TLR2, LGALS3, DKFZP586E1621, SOD2, 
DUSP6, KIAA0669, IL1RN, KIAA0410, EEF1A2, BRF2, SNRPG, NUMA1, AKR1B1, 
SMARCE1, MSF, and PTMA. An 18-gene class predictor can consist of TLR2, LGALS3, 
DKFZP586E1621, SOD2, DUSP6, KIAA0669, IL1RN, KIAA0410, T54, EEF1A2, BRF2, 
SNRPG, NUMA1, AKR1B1, SMARCE1, MSF, PTMA, and PSMD3. Finally, a 20-gene 
class predictor consists of EEF1A2, TLR2, BRF2, LGALS3, SNRPG, DKFZP586E1621, 
NUMA1, SOD2, AKR1B1, DUSP6, SMARCE1, KIAA0669, MSF, IL1RN, PTMA, 
KIAA0410, PSMD3, T54, C1QBP, and OSR1. 

[0552] In another preferred embodiment, the class predictor comprises (1) at least 1, 

2, 3, 4, 5, 6, 7, 8, 9, or 10 genes selected from the group consisting of TLR2, LGALS3, 
DKFZP586E1621, SOD2, DUSP6, KIAA0669, IL1RN, KIAA0410, T54 and OSR1, and 
(2) at least 1, 2, 3, 4, 5, 6, 7, 8, 9, or 10 genes selected from the group consisting of 
EEF1A2, BRF2, SNRPG, NUMA1, AKR1B1, SMARCE1, MSF, PTMA, PSMD3 and 
C1QBP. 

[0553] In yet another preferred embodiment, the class predictor comprises or 

consists of 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes selected from the group consisting of 
CD44, KIAA0410, MARCO, MAP3K8, NSP-CL, PIP5K1C, NRG1, RAB31, LGALS3, 
MEF2D, ITGA7, LHFPL2, ETS2, KHSRP, ENIGMA, UNK_AF038187, RAB13, TLR2, 
T54 and DUSP6. 

[0554] In still another preferred embodiment, the class predictor comprises or 

consists of 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes selected from the group consisting of 
CD44, CRADD, CCRL2, KIAA0837, KIAA0707, KIAA1113, EREG, UNKAL0501 19, 
PPARD, CTSL, ATP2B1, UNK_AF0521 15, MITF, STAT3, KIAA0410, TPD52L2, 
UNK_AI732885, MARCO, LOC64116, and PDNP2. The class predictors of this 
embodiment can be used to predict RCC, prostate cancer, head/neck cancer, and disease- 
free. 

[0555] In still yet another preferred embodiment, the class predictor comprises or 

consists of 2, 4, 6, 8, 10, 12, 14, 16, 18 or 20 genes selected from the group consisting of 
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NUMA1, CXCR4, IL10RA, M9, FAU, BRF2, RPS6, EEF1A2, BAGS, AKR1B1, 
UNKAL022721, C1QBP, DKZP586E0820, NONO, PSMD3, UNKJN74607, 
UNK_AI743507, MAPKAPK5, and UNKJJ79297. The class predictors of this 
embodiment can be used to predict solid tumor versus solid tumor-free, regardless of the 
particular type of the solid tumor. The solid tumor predictable in this embodiment includes 
RCC, prostate cancer, and head/neck cancer. 

[0556] In one embodiment, the reference expression levels of RCC disease genes, 

such as the expression levels derived from disease-free humans or known RCC or solid 
tumor patients, are stored in a database and are readily retrievable. In another embodiment, 
the comparison between expression profiles of various genes is performed electronically, 
such as using a computer system. The computer system comprises a processor coupled to a 
memory which stores data representing the expression profiles being compared. Preferably, 
the memory is readable as well as rewritable. The expression data stored in the memory can 
be changed, retrieved or otherwise manipulated. The memory also stores one or more 
programs capable of causing the processor to compare the stored expression profiles. For 
instance, the program may be able to execute a weighted voting algorithm. The processor 
can also be coupled to a polynucleotide array scanner and is capable of receiving signals 
from the scanner. 

[0557] In another embodiment, a confidence threshold is established to optimize the 

accuracy of prediction and minimize the incidence of both false positive and false negative 
results. Average confidence scores collected for the accumulating pool of correctly 
diagnosed patients and correctly non-diagnosed disease-free individuals can be calculated 
and a reference range of values, for the particular predictive gene set diagnostic in question, 
can be reported. 

F. Other Applications 

[0558] The systematic gene expression analysis of this invention can be used to 

identify genes that are differentially expressed in peripheral blood samples isolated at 
different stages of the progression, development or treatment of RCC and/or other solid 
tumors. Genes thus-identified are molecular markers for monitoring the progression, 
development or treatment of RCC and/or other solid tumors. Genes thus-identified can also 
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be used as surrogate markers for evaluating the efficacy of a treatment for RCC or other 
solid tumors. 

[0559] A clinical challenge concerning RCC and other solid tumors is the highly 

variable response of patients to therapy. The basic concept of pharmacogenomics is to 
understand a patient's genotype in relation to available treatment options and then 
individualize the most appropriate option for the patient. Different classes of RCC and/or 
other solid tumor patients can be created based on their different responses to a given 
therapy. Differentially expressed genes in these classes can be identified using the global 
gene expression analysis. Genes thus-identified can serve as predictive markers for 
forecasting whether a particular patient will be more or less responsive to the given therapy. 
For patients predicted to have a favorable outcome for the therapy, efforts to minimized 
toxicity of the therapy may be considered, whereas for those predicted not to respond to the 
therapy, treatment with other therapies or experimental regimes can be used. 
[0560] The present invention also contemplates expression vectors encoding the 

RCC disease genes. The RCC disease genes may be under-expressed in RCC tumor cells. 
By introducing of the expression vectors into the patients, abnormal expression of the target 
genes may be corrected. 

[0561] Suitable expression or gene delivery vectors are well known in the art. 

Preferably, these vectors are viral vectors, such as retroviral, lentiviral, adenoviral, adeno- 
associated viral (AAV), herpes viral, or alphavirus vectors. The viral vectors can also be 
astrovirus, coronavirus, orthomyxovirus, papovavirus, paramyxovirus, parvovirus, 
picornavirus, poxvirus, or togavirus viral vectors. 

[0562] Delivery of the expression constructs is not limited to the above mentioned 

viral vectors. Other delivery methods can also be employed. These methods include 
nucleic acid expression vectors, polycationic condensed DNA linked or unlinked to killed 
adenovirus, ligand linked, gene guns, ionizing radiation, nucleic charge neutralization, or 
fusion with cell membranes. Naked DNA can also be employed. Exemplary methods to 
use naked DNA are known in the art. Uptake efficiency may be improved using 
biodegradable latex beads. This method can be further improved by treating the beads to 
increase their hydrophobicity. Liposome-based methods can also be used. 
[0563] In addition, this invention contemplates expression vectors capable of 

expressing sequences that are anti-sense to a RCC disease gene of interest. The RCC 
disease gene of interest may be over-expressed in RCC or other solid tumor patients. By 
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introducing the antisense expression vector into these patients, the abnormal expression of 
the gene can be corrected. 

[0564] An "antisense" polynucleotide comprises a nucleotide sequence which is 

complementary to a "sense" polynucleotide which encodes a protein. An antisense 
polynucleotide can bind via hydrogen bonds to the sense polynucleotide. The antisense 
polynucleotide can be complementary to an entire coding strand of the target gene, or a 
portion thereof In one embodiment, the antisense polynucleotide molecule is antisense to a 
"noncoding region" in the coding strand of the target gene. 

[0565] Antisense polynucleotides can be designed according to the rules of Watson 

and Crick base pairing. They may be oligonucleotides which are antisense to only a portion 
of the target gene. An antisense polynucleotide can be, for example, about 5, 10, 15, 20, 25, 
30, 35, 40, 45 or 50 nucleotides in length. An antisense polynucleotide can be constructed 
using chemical synthesis and enzymatic ligation reactions as appreciated by one of skill in 
the art. For example, an antisense polynucleotide (e.g., an antisense oligonucleotide) can be 
chemically synthesized using naturally occurring nucleotides or variously modified 
nucleotides designed to increase the biological stability of the molecules or to increase the 
physical stability of the duplex formed between the antisense and sense polynucleotides. 
Alternatively, the antisense polynucleotide can be produced biologically using an 
expression vector into which a polynucleotide has been subcloned in an antisense 
orientation {i.e., RNA transcribed from the inserted polynucleotide will be of an antisense 
orientation to the target polynucleotide of interest). 

[0566] The antisense polynucleotides can be administered to a subject or applied in 

situ such that they hybridize or bind to cellular mRNAs and/or genomic DNAs of the target 
gene, thereby inhibiting the expression of the target gene. The hybridization can result in a 
stable duplex via conventional nucleotide complementarity. An example route for 
administering antisense polynucleotides includes direct injection at a tissue site. Antisense 
polynucleotides can also be modified first, and then administered systemically. For 
example, for systemic administration, antisense molecules can be modified such that they 
specifically bind to receptors or antigens expressed on a selected cell surface. Suitable 
modifications include linking the antisense polynucleotides to peptides or antibodies which 
bind to the cell surface receptors or antigens. In addition, the antisense polynucleotides can 
be delivered to cells using vectors. To achieve sufficient intracellular concentrations of the 
antisense molecules, strong pol II or pol III promoters may be used in the vectors. 
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[05671 I n one embodiment, the antisense polynucleotides are a-anomeric 

polynucleotides. An a-anomeric polynucleotide molecule forms specific a double-stranded 
hybrid with a complementary RNA in which, contrary to the usual ^-units, the strands run 
parallel to each other. The antisense polynucleotide molecule can also comprise a 
2 f -o-methylribonucleotide or a chimeric RNA-DNA analogue. 

[0568] In another embodiment, the antisense polynucleotide is a ribozyme. 

Ribozymes are catalytic RNA molecules with ribonuclease activity which are capable of 
cleaving a single-stranded polynucleotide, such as an mRNA, to which they have a 
complementary region. Thus, ribozymes can be used to catalytically cleave mRNA 
transcripts of the target gene in order to inhibit its expression. mRNAs transcribed from the 
target gene can be used to select from a pool of RNA molecules a catalytic RNA having a 
specific ribonuclease activity. Alternatively, the expression of the target gene can be 
inhibited by using nucleotide sequences complementary to the regulatory region (e.g. , the 
promoter and/or enhancers). These nucleotide sequences can form triple helical structures 
that prevent transcription of the gene in target cells. 

[0569] Expression of the target gene can also be inhibited using RNA interference 

("RNA"). This is a technique used in post transcriptional gene silencing ("PTGS")> in 
which the targeted gene activity is specifically abolished. RNAj resembles in many aspects 
PTGS in plants and has been detected in many invertebrates including trypanosome, hydra, 
planaria, nematode and fruit fly (Drosophila melanogaster). It may be involved in the 
modulation of transposable element mobilization and antiviral state formation. RNA in 
mammalian systems is disclosed in PCT application WO00/63364. In one embodiment, 
dsRNA of at least about 21 nucleotides, homologous to the target gene, is introduced into 
cells. 

[0570] Antibodies against the polypeptides encoded by the RCC disease genes can 

be also prepared and administered to patients in order to affect the function of the RCC 
disease genes. In one embodiment, the antibodies can reduce at least 25% of the activity of 
the target gene. Preferably, the antibodies reduce at least about 50% of the activity of the 
corresponding gene. Highly preferably, the antibodies reduce about 95-100% of the activity 
of the target gene. 

[0571] A pharmaceutical composition comprising the antibody or expression vector 

of this invention can be made. The pharmaceutical composition also includes a 
pharmaceutically acceptable carrier. As used herein, a "pharmaceutically acceptable 
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carrier" is intended to include any and all solvents, solubilizers, fillers, stabilizers, binders, 
absorbents, bases, buffering agents, lubricants, controlled release vehicles, diluents, 
emulsifying agents, humectants, lubricants, dispersion media, coatings, antibacterial or 
antifungal agents, isotonic and absorption delaying agents, and the like, compatible with 
pharmaceutical administration. The use of such media and agents for pharmaceutically 
active substances is well-known in the art. Except insofar as any conventional media or 
agent is incompatible with the active compound, use thereof in the compositions is 
contemplated. Supplementary agents can also be incorporated into the compositions. 
[0572] A pharmaceutical composition can be formulated to be compatible with its 

intended route of administration. Examples of routes of administration include parenteral, 
e.g., intravenous, intradermal, subcutaneous, oral (e.g., inhalation), transdermal (topical), 
transmucosal, and rectal administration. Solutions or suspensions used for parenteral, 
intradermal, or subcutaneous application can include the following components: a sterile 
diluent such as water for injection, saline solution, fixed oils, polyethylene glycols, 
glycerine; propylene glycol or other synthetic solvents; antibacterial agents such as benzyl 
alcohol or methyl parabens; antioxidants such as ascorbic acid or sodium bisulfate; 
chelating agents such as ethyl enediaminetetraacetic acid; buffers such as acetates, citrates or 
phosphates and agents for the adjustment of tonicity such as sodium chloride or dextrose. 
pH can be adjusted with acids or bases, such as hydrochloric acid or sodium hydroxide. 
The parenteral preparation can be enclosed in ampoules, disposable syringes or multiple 
dose vials made of glass or plastic. 

[0573] Examples of suitable RCC disease genes that can be used as the targets of 

gene therapy or drug treatment include, but are not limited to, DUSP6, DRD2, ABL1, 
GUK1, MAP2K3, BSG, PPARG, TNNT1, ERN1, C4A, CCR1, PPARD, PDXK, MMP9, 
PPP3CB, CHRNA4, C8FW, PDNP2, ALDH5A1, and GPR12. Other examples include the 
RCC disease genes that are over- or under-expressed in both PBMCs and RCC tumor 
tissues. 

[0574] In one embodiment, the present invention provides a kit comprising one or 

more polynucleotides, each of said one or more polynucleotides capable of hybridizing 
under stringent conditions to a gene selected from Gene-Table-4. Any primer/probe of this 
invention, or the complement thereof, can be included in the kit. The polynucleotide(s) can 
be labeled with fluorescent, radioactive, or other detectable moieties. In one instance, the 
one or more polynucleotides are contained in vials, tubes, bottles or other containing means. 
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In another instance, the one or more polynucleotides are stably attached to a solid support. 
Nucleic acid hybridization can be directly carried out on the solid support. In yet another 
instance, the kit contains at least 2, 3, 4, 5, 10, 15, 20, or more polynucleotides, each 
different polynucleotide capable of hybridizing under stringent conditions to a different 
respective gene selected from Gene-Table-4 

[0575] In another embodiment, the kit of the present invention contains one or more 

antibodies capable of binding to the polypeptides encoded by the genes selected from Gene- 
Table-4. The antibodies can be labeled or unlabeled Any antibody of this invention can be 
included in the kit. In one example, the kit also includes other immunodetection reagents, 
such as secondary antibodies, controls or enzyme substrates. In another example, the 
antibodies are included in one or more containers. In yet another example, the antibodies 
are stably bound to a solid support, such as a film, membrane, column matrix, or microtiter 
plate wells. Immunoassays can be performed directly on the solid support. In still yet 
another example, the kit contains at least 2, 3, 4, 5, 10, 15, 20, or more different antibodies, 
each different antibody capable of binding to a polypeptide encoded by a different 
respective genes selected from Gene-Table-4. 

[05761 It should be understood that the above-described embodiments and the 

following examples are given by way of illustration, not limitation. Various changes and 
modifications within the scope of the present invention will become apparent to those 
skilled in the art from the present description. 

G. Examples 

Example 1. Isolation of RNA and Preparation of Labeled Microarray Targets 

[0577] PBMCs from the clinical trials were isolated from whole blood samples 

(8mL) collected into CPT tubes according to the standard procedure. All disease-free and 
RCC blood samples were shipped or stored overnight prior to processing. PBMCs were 
purified over Ficoll gradients, washed two times with PBS and counted. Total RNA was 
isolated from PBMC pellets using the RNeasy mini kit (Qiagen, Valencia, CA). Labeled 
target for oligonucleotide arrays was prepared using a modification of the procedure 
described in Lockhart, et al % Nature Biotechnology, 14: 1675-80 (1996). 2 ^g total RNA 
was converted to cDNA by priming with an oligo-dT primer containing a T7 DNA 
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polymerase promoter at the 5' end. The cDNA was used as the template for in vitro 
transcription using a T7 DNA polymerase kit (Ambion, Woodlands, TX) and biotinylated 
CTP and UTP (Enzo). Labeled cRNA was fragmented in 40 mM Tris-acetate pH 8.0, 100 
mM KOAc, 30 mM MgOAc for 35 minutes at 94°C in a final volume of 40 \x\. 

Example 2. Hybridization to Affymetrix Microarravs and Detection of Fluorescence 

[0578] Individual RCC and disease-free samples were hybridized to HgU95A 

genechip (Affymetrix). No samples were pooled. 45 RCC patients and 20 disease-free 
volunteers were involved in the study. Tumors of the RCC patients were 
histopathologically classified as specific renal cell carcinoma subtypes using the Heidelberg 
classification of renal cell tumors described in Kovacs, et aL, J. Pathol., 183:131-133 
(1997). Among the 45 RCC tumor samples, twenty-four samples were classified as 
conventional (clear cell) carcinomas, one sample was classified as granular, three samples 
were classified as papillary, seven samples were classified as mixed subtypes, and ten tumor 
samples were classified as unknown. 

[0579] 10 jag of labeled target was diluted in Ix MES buffer with 100 jig/ml herring 

sperm DNA and 50 |ig/ml acetylated BSA. To normalize arrays to each other and to 
estimate the sensitivity of the oligonucleotide arrays, in vitro synthesized transcripts of 1 1 
bacterial genes were included in each hybridization reaction as described in Hill et aL, 
Science, 290: 809-812 (2000). The abundance of these transcripts ranged from 1:300,000 
(3 ppm) to 1 : 1000 (1000 ppm) stated in terms of the number of control transcripts per total 
transcripts. As determined by the signal response from these control transcripts, the 
sensitivity of detection of the arrays ranged between about 1:300,000 and 1:100,000 
copies/million. Labeled probes were denatured at 99°C for 5 minutes and then 45°C for 5 
minutes and hybridized to oligonucleotide arrays comprised of over 12,500 human genes 
(HgU95A, Affymetrix). Arrays were hybridized for 16 hours at 45°C. The hybridization 
buffer was comprised of 100 mM MES, 1 M [Na + ], 20 mM EDTA, and 0.01% Tween 20. 
After hybridization, the cartridges were washed extensively with wash buffer (6x SSPET), 
for instance, three 10-minute washes at room temperature. These hybridization and 
washing conditions are collectively referred to as "nucleic acid array hybridization 
conditions." The washed cartridges were then stained with phycoerythrin coupled to 
streptavidin. 
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[0580] 12x MES stock contains 1.22 M MES and 0.89 M [Na + ]. For 1000 ml, the 

stock can be prepared by mixing 70.4 g MES free acid monohydrate, 193.3 g MES sodium 
salt and 800 ml of molecular biology grade water, and adjusting volume to 1000 ml. The 
pH should be between 6.5 and 6.7. 2x hybridization buffer can be prepared by mixing 8.3 
mL of 12x MES stock, 17.7 mL of 5 M NaCl, 4.0 mL of 0.5 M EDTA, 0.1 mL of 10% 
Tween 20 and 19.9 mL of water. 6x SSPET contains 0.9 M NaCl, 60 mM NaH 2 P0 4 , 6 mM 
EDTA, pH 7.4, and 0.005% Triton X-100. In some cases, the wash buffer can be replaced 
with a more stringent wash buffer. 1000 ml stringent wash buffer can be prepared by 
mixing 83.3 mL of 12x MES stock, 5.2 mL of 5 M NaCl, 1.0 mL of 10% Tween 20 and 
910.5 mL of water. 

Example 3. Gene Expression Data Analysis 

[0581] Data analysis was performed on raw fluorescent intensity values using 

GENECHIP 3.2 software (Affymetrix). GENECHIP 3.2 software uses an algorithm to 
calculate the likelihood as to whether a gene is "absent" or "present" as well as a specific 
hybridization intensity value or "average difference" for each transcript represented on the 
array. The algorithms used in these calculations are described in the Affymetrix GeneChip 
Analysis Suite User Guide (Affymetrix). The "average difference" for each transcript was 
normalized to "frequency" values according to the procedures of Hill et aL, Science, 290: 
809-812 (2000). This was accomplished by referring the average difference values on each 
chip to a calibration curve constructed from the average difference values for the 1 1 control 
transcripts with known abundance that were spiked into each hybridization solution. This 
process also served to normalize between arrays. 

[0582] Specific transcripts were evaluated further if they met the following criteria. 

First, genes that were designated "absent" by the GENECHIP 3.2 software in all samples 
were excluded from the analysis. Second, in comparisons of transcript levels between 
arrays, a gene was required to be present in at least one of the arrays. Third, for 
comparisons of transcript levels between groups, a Student's Mest was applied to identify a 
subset of transcripts that had a significant (p < 0.05) differences in frequency values. In 
certain cases, a fourth criterion, which requires that average fold changes in frequency 
values across the statistically significant subset of genes be 2-fold or greater, was also used. 
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[0583] Unsupervised hierarchical clustering of genes and/or arrays on the basis of 

similarity of their expression profiles was performed using the procedure described in Eisen, 
et al, Proc. Nat. Acad. Sci., U.S.A., 95: 14863-14868 (1998). Nearest neighbor prediction 
analysis and supervised cluster analysis was performed using metrics illustrated in Golube/ 
a/., Science, 286: 531-537 (1999). For hierarchical clustering and nearest neighbor 
prediction analysis, data were log transformed and normalized to have a mean value of zero 
and a variance of one. A Student's Mest was used to compare disease-free PBMC 
expression profiles to renal carcinoma PBMC profiles. In the comparisons, a p value < 0.05 
was used to indicate statistical significance. 

[0584] Expression profiles in various tissues can also be accessed and downloaded 

from the BioExpress database (GeneLogic, Gaithersburg MD). GeneLogic GX2000 
software based analysis tools including fold change analysis and electronic northerns can be 
utilized to calculate fold changes and distribution of expression values, and expression 
profiles for different samples can be exported using the expression analysis tool for further 
analysis in the hierarchical clustering package (Eisen, et al., Proc. Nat. Acad. Sci., U.S.A., 
95:14863-14868(1998)). 

[0585] A k-nearest neighbor's approach was used to perform a neighborhood 

analysis of real and randomly permuted data using a correlation metric (P(g,c) = ixl-\i2 I al 
+ a2) where g is the expression vector of a gene, c is the class vector, \x\ and al define the 
mean expression level and standard deviation of the gene in class 1 and \x2 and a2 define 
the mean expression level and standard deviation of the gene in class 2. The measures of 
correlation for the 246 most statistically significant upregulated genes of the true defined 
classes (RCC versus disease-free) were compared to the most statistically significant 
measures of correlation observed in randomly permuted class distinctions. The top 1%, 5% 
and median distance measurements of 100 randomly permuted classes compared to the 
observed distance measurements for RCC and disease-free classes are plotted. FIG. 1 
depicts the statistical verification of the RCC disease genes identified in this invention. 

Example 4. Identification of RCC Disease Genes in Peripheral Blood 

[0586] Tables 6 and 7 list 184 RCC disease genes which are ranked by the number 

of samples in which the gene was called present (# Present). The p- value of the Student's t- 
test ("T-test (p-value)") for each of the 184 RCC disease genes is also listed in Table 6. 
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"Present" calls were calculated using GENECHIP 3.2 software by estimating whether a 
transcript was detected in a sample based on the strength of the gene's signal compared to 
background. See GeneChip® Expression Analysis Technical Manual, 701021 Rev.3 (1999- 
2002 Affymetrix, Inc.). 

[0587] The "average difference" values for each transcript were normalized to 

"frequency" values using the scaled frequency normalization method in which the average 
differences for 11 control cRNAs with known abundance spiked into each hybridization 
solution were used to generate a global calibration curve. See Hill et aL, Genome Biol., 
2(12):research0055. 1-0055. 13 (2001), which is incorporated herein in its entirety by 
reference. This calibration was then used to convert average difference values for all 
transcripts to frequency estimates, stated in units of parts per million (ppm) which can 
range, but are not limited to, from 1 :300,000 (i.e., 3 ppm) to 1 : 1000 (1000 ppm). 
[0588] Expression profiling analysis of the 20 disease-free PBMC RNA samples 

and 45 RCC PBMC RNA samples revealed that of the 12,626 transcripts on the HgU95A 
chip, 5,249 transcripts met the initial criteria for further analysis. The initial criteria were 
(1) there was at least one present call, and (2) at least one frequency was over 10 ppm. On 
average, 4023 transcripts were detected as "present" in the 45 RCC PBMCs, while 4254 
expressed transcripts were detected as "present" in the 20 disease-free PBMCs. 
[0589] The percent coefficients of variation (i.e., mean frequency / S.D. X 100) of 

each of the 5,249 original transcript levels across both groups of samples (45 RCC, 20 
disease-free or normal PBMCs) were calculated (% COV). Transcripts were ranked where 
the least variable gene across the RCC samples received an RCC COV Rank =1 and the 
most variable gene across the RCC samples received an RCC COV Rank = 5249. This 
process was repeated for the 20 disease-free (normal) PBMC samples and the Normal COV 
Rank was calculated in similar fashion, i.e., the least variable gene across the disease-free 
RCC samples received an Norm COV Rank =1 and the most variable gene across the 
disease-free samples received an Norm COV Rank = 5249. In addition, fold changes were 
calculated as RCC Average Frequency / Normal Average Frequency, where a number equal 
to or greater than 2.0 was considered to represent a transcript induced in RCC PBMCs. 
Fold changes for each of the 5249 transcripts are depicted in Table 6. The number of 
samples possessing levels greater than lOppm ("# Freq > 10") is also presented in Table 6 
for each transcript. Moreover, the number of samples where the transcript was called 
present across the 45 RCC ("# Present RCC"), called present across the 20 Normals "(# 
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Present Normal"), present at levels greater than 10 ppm across the 45 RCC ("# Freq > 10 
RCC"), and present at levels greater than 10 ppm across the 20 normals ("# Freq > 10 
Norm") are reported in Tables 6 and 7. 

[0590] A fold change analysis and Student's t test (two-tailed distribution; two- 

sample unequal variance) identified transcripts differentially expressed between RCC 
PBMCs and disease-free PBMCs. Transcript levels of the 184 RCC disease genes shown in 
Tables 6 and 7 differed on average by 2-fold or greater between disease-free and RCC 
PBMCs with an unadjusted p-value below 0.001 in a / test between the groups. Of these, 
132 transcripts were expressed in at least 15% of the PBMC samples (present in 10 or more 
of the 65 profiles). 

[0591] Furthermore, the possibility that the observed differences in expression 

profiles of CPT-purified RCC PBMC pellets and CPT-purified disease-free PBMC pellets 
were simply investigated. A correlation coefficient for each gene's expression level with 
the level of granulocytes, lymphocytes and monocytes measured in PBMC samples was 
calculated. The relative correlation of expression of each gene with the level of each cell 
type was ranked to determine whether the disease-associated transcripts detected in RCC 
PBMCs were over-represented in a given cell population. The relative rank (out of the 
5,249 transcripts passing the initial data filter) correlations of each transcript with the 
absolute numbers of granulocytes, lymphocytes and monocytes measured in PBMC samples 
were obtained. These analyses indicate that the vast majority of disease-associated 
transcripts identified in PBMCs of RCC patients were not simply correlated with specific 
cell subpopulations in peripheral blood. 

[0592] An initial unsupervised cluster analysis approach which hierarchically 

groups samples and genes based on correlation coefficients (Eisen et al. 9 supra) was 
performed using the 5,249 transcripts passing the initial filtering criteria. FIG. 2 depicts a 
dendrogram of sample relatedness using expressed gene expression values. RCC patient 
PBMC expression profiles were denoted by white bars and disease-free volunteer PBMC 
expression profiles were indicated by black bars. The dendrogram grouped the majority of 
RCC PBMCs (42/45) into a single RCGspecific cluster while expression patterns of 
disease- free PBMCs and a small subset of RCC PBMCs (3/45) formed a separate cluster. 
[0593] Among the 184 RCC disease genes listed in Tables 6a and 7, there were 

several inflammatory-related genes, including Toll-like receptor 2, galectin-3, IL-1 receptor 
antagonist, and aquaporin-9, a water channel implicated in leukocyte migration. The 
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unchanged levels of many other cytokines, chemokines and their respective receptors 
between normal and RCC PBMCs suggest that a specific, rather than global, activation of 
PBMCs constituted part of the disease signature in RCC peripheral blood. 
[0594] A substantial number of the transcripts detected as significantly changed in 

PBMCs from RCC patients possess a significant degree of variability across the RCC 
PBMC profiles. This indicates that while the levels of these transcripts were significantly 
distinct from levels in normal PBMCs, there was relative heterogeneity of expression of 
these transcripts across RCC patients. It will be of great interest to determine whether any 
of these disease-associated yet significantly variable transcripts in RCC PBMCs will be 
correlated with any clinical categories of response, once clinical indices of outcome and 
follow-up are satisfactorily measured in these patients. 
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yjl2d03.sl Soares placenta 
Nb2HP Homo sapiens cDNA 
clone IMAGE: 1485 17 3' similar 
to WNT6 
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gene, partial cds 
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cotransporter [R.norvegicus] 
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methyltransferase- 1 


myosin, light polypeptide 5, 
regulatory 


putative non-coding transcript 
(DiGeorge critical region 5) 


G protein-coupled receptor 3 


Human BRCA2 region, mRNA 
sequence CGOl 1 


inositol polyphosphates- 
phosphatase, type 1, 107kD 


cholinergic receptor, nicotinic, 
alpha polypeptide 4 


H.sapiens mRNA for unknown 
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mucin 1, transmembrane 


D site of albumin promoter 
(albumin D-box) binding 
protein 


phosphodiesterase I/nucleotide 
pyrophosphatase 2 (autotaxin) 


Phosphatidic acid phosphatase 
type 2b 


core-binding factor, runt 
domain, alpha subunit 2; 
translocated to, 3 


Homo sapiens clone 24741 
mRNA sequence 


double homeobox, 1 


<3J 
gc 
cd 

1 

c/: 

i 

OJ 


(alanine aminotransferase) 


GenBank 
Accession 
Number 




L03785 


X91348 


L32831 


AI732885 


U96919 


U62433 


AF055000 


J05581 


U48213 


D45421 


1 

AF017786 


AB010419 ! 


AF070587 


AJ001481 


U70732 



OO 



A 
cr 



<D 



CO D 

11 4 



o 3 

fe 6 



a 

8 

OX) 

3 




oo 
m 
o\ 

CN 
O 

o 
o 
o 



CN 



On 
O 



OS 
O 




oo 
m 
O 



o 
tn 
m 
O 

o 
o 
o 



VO 



o 

CN 

o 



O 
m 
oo 



S3 



ON 

CN 

o 

1 



CN 
OO 
VO 
ro 

o 
o 
o 

o 



CN 

CN 



o 



5 

o 



o 

oo 

CN 



o 
m 

ro 



r- 

ON 

O 
O 

o 



Os 
O 
O 



m 
VO 
o\ 



CN 

o> 
m 
m 

CN 

00 

a 



5 



3 -a 



■§ 

CO 

f 

S 
3 
s 



oo 



o 



oo 

o 
a 



00 

H 

Put 

2 

o 

00 

I 

H 

Q 
W 
H 
< 

o 

oo 
oo 

< 
i 

W 
oo 

00 



oo 



A S 

=*fc 



AO 
=tfc 



8 !§ 



GO 



=tfc 



C3 

eu Pi 



GO X> 

8 I 
° 55 



1 



H 



o 

CN 



CN 

2 
2 



CN 



o 



o 

CN 



o 

CN 



0X) 

a ^ 

.S « 
« 

O 

o m 

OJQX> 

.S o 

CO 



8 



O 
OO 

vo 
o 



o 

CN 



o 

CN 




CN 
On 

vo 



H 

*S CO 
< 



On 



O 
CN 



CO 



CN 
>< 



5 



o 

CN 



CN 
OO 



•| 

PQ 

CO 

1 

■§ 

a 

B 
> 



CN 

a 



• f— I 

c 
o 

I 



l-i o 

=tfc 



S 12 

CO 



Ah 



O 



CO X 
4fc 



O 
CN 



o 

CN 



10 



o 

CN 



O 
CN 



vo 



O 
cn 



o 

CN 



in 



On 



"§ CO 

$ 13 
< 



On 



CO 



o 

CN 



ON 



co 



o 

CN 



ir> 



ON 



oo 



5 



o 

CN 



co 



ON 



to 





K 


o 


Ph 


• rH 


Q 




< 


Ann( 


;e(N 


8 




o 


redu 





5\ ^ 



■ 

ON 



CO 

s 



§ a 



e 

a 3 




*2 & 



CN 
CN 
ON 

r- 
o 




O 
O 



o 
o 

c 
•c 

i 

8 
>-» 



o 



vo 



ON 



no 

ON 
CN 



g 

*s 

s 

g 

I 

-a 
o 

CO 



a 

2 
2 



a 

s 

Pi 

X 

o 
.o 



a 
o 



i 

s 

CO 

o 

a 



NO 



X 



CN 



a 
o 

• 

CO 
CO 

s 

-a 

CO 

O 



0X3 



o 
o 



*o 

cd 



Oh 
O 



g S 

co O 



CO 



OO 

CN 
NO 



o 



NO 

<u 

CO 

co^ 
O 

P. 

*o 
'o 

o 



^ a 



< 



m 
co 
»o 
tn 

CN 

o 



I 



O CO 

- £ O 
o 



T3 

<U CO 

< 



ii 



o 



On 



o 

CM 



O 



O 



A V 
=tfc 



CO 



ti- 



o 



§ "3 

CO H 



On 



On 



O 



On 



O 
CN 



O 
CN 



ON 



On 



On 



oo 



o 

CM 



ON 



OO 



oo 



co x 
3fc 



5 



cn 




OX) 

1 




vo 
m 
o 

8 



§ 

I- 

*-> 

o 
6 

1 



vo 

o 
o 



OJ^ 

o 



OX) C 



I 

3 

13 
o 

o 
o 

CO 



i 

> — ' 
CN 

< 



X O 



8 ^ 3 

8 -s o 



H Q 



a 
oq 



VO 
vo 

CO 

oo 

Q 



*-r CO vw 

.2 u § 
5 



OO 



<L> 

-4— » 

-a 

CO 

O 

« 

§ 1 
6 ^ 

§ 2 



a 



ON 



o 

r- 

CN 



i 

2 .S 
•2 •& 

•a o 
'5 •& 

I .3 

g ^ 
a, 

•a 



CN 
m 

On 
X 



a 
o 



2 

O 

2 

. c 
1 

TS 



o 

CN 
in 

X 



•0 



On 
CS 

o 

s 1 



<U 

> 2 

a 



S3 



OX) ^ 

o 



VO 
OO 

Q 



CO 

3 S 

I 'I 
.1 5 

CO 1 — l 

sc o 

co" 

H < 
<3 



o 

VO 



OJQj 



O 

I 



OX) ^ 

- y 

as 
X 

o 



^3 



Common with 






Activated T 
Cells 










Activated T 
Cells 








Activated T 
Cells 










Activated T 


#Freq>10 
Normal 




co 


o 


o 


CN 


o 




CN 


»— H 


lO 




r-* 




o 

CN 


o 




#Freq>10 
RCC 




oo 

CN 


CN 
CN 


CO 


W> 
CN 


r- 

r— 1 


CN 


CO 
CO 


OO 


CO 
CO 


ON 
CN 


CN 




lO 


CO 


CO 


# Present 
Normal 




oo 


On 


VO 


On 


OO 




o 


OO 


vo 


VO 


oo 




iO 






# Present 
RCC 




CN 








CN 


CO 


o 


CN 


CO 


CO 






CO 


CO 


co 


Gene Annotation 


hydrolase; glyoxalase 2 


fatty acid binding protein 5 
(psoriasis-associated) 


Kell blood group precursor 
(McLeod phenotype) 


tumor necrosis factor alpha- 
inducible protein with leucine 
zipper domains; Huntingtin 
interacting protein L 


v-ets avian erythroblastosis 
virus E26 oncogene homolog 2 


ralA binding protein 1 


Tubulin, Beta 


S100 calcium-binding protein 
All (calgizzarin) 


2,3-bisphosphoglycerate mutase 


ATPase, Ca-H- transporting, 
plasma membrane 1 


Tubulin, Beta 


plasminogen activator inhibitor, 
type II (arginine-serpin) 


selenium binding protein 1 


erythrocyte membrane protein 
band 4.9 (dematin) 


basigin 


I nucleoside phosphorylase 
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viral oncogene homolog 


SI 00 calcium-binding protein P 


integrin, beta 3 (platelet 
glycoprotein Ilia, antigen 
CD61) 


epithelial protein up-regulated 
in carcinoma, membrane 
associated protein 17 


tropomyosin 1 (alpha) 


solute carrier family 4, anion 
exchanger, member 1 


interleukin 1 receptor, type I 


superoxide dismutase 2, 
mitochondrial 


small inducible cytokine A7 
(monocyte chemotactic protein 
3) 


ESTs, Weakly similar to ! ! ! ! 
ALU CLASS E WARNING 
ENTRY !!!![H.sapiensl 


interferon (alpha, beta and 
omega) receptor 2 


chemokine (C-C motif) receptor 


epiregulin 


nuclear factor (erythroid- 
derived 2), 45kD 
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band 4.2 


myosin, light polypeptide 4, 
alkali; atrial, embryonic 


matrix metalloproteinase 9 
(gelatinase B, 92kD gelatinase, 
92kD type IV collagenase) 


Fc fragment of IgA, receptor for 


Homo sapiens clone 23953 
mRNA sequence 


ESTs, Highly similar to CGI-56 
protein [H.sapiens] 


yj 1 2d03 .s 1 Soares placenta 
Nb2HP Homo sapiens cDNA 
clone IMAGE:148517 3' similar 
to WNT6 


BCL2-like 1 


Human thrombospondin-1 gene, 
partial cds 


MAX dimerization protein 


peptidyl arginine deiminase, 
type II 


MAX-interacting protein 1 


plasminogen activator, 
urokinase receptor 


KIAA0750 gene product 


solute carrier family 1 
(glutamate/neutral amino acid 
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transporter), member 4 


three prime repair exonuclease 1 


pyridoxal (pyridoxine, vitamin 
B6) kinase 


tropomyosin 1 (alpha) 


enigma (LIM domain protein) 


Homo sapiens mRNA; cDNA 
DKFZp434F152 (from clone 
DKFZp434F152) 


nucleoporin 214kD (CAIN) 


solute carrier family 6 
(neurotransmitter transporter, 
creatine), member 8 


tropomodulin 


cytosolic acyl coenzyme A 
thioester hydrolase 


Human lipocortin (LIP) 2 
pseudogene mRNA, complete 
cds-like region 


peroxisome proliferative 
activated receptor, gamma 


T54 protein 


ESTs, Weakly similar to 38kDa 
splicing factor [H.sapiens] 


interleukin 1 receptor accessory 
protein 


Cluster Incl U97067; Homo 


GenBank 
Accession 
Number 




AJ243797 


U89606 


M19267 


L35240 


AL096737 


i D14689 


U36341 


M77016 


U91316 


M62896 


L40904 


U66359 


W2893L 


AB006537 


U97067 



c 
o 

E 

a 

o 
O 



s I 



10 



AO 



on 



to 



10 



^1- 



§ *2 



3fc 



on 



Oh PS 

3fc 



On 



On 



oo 



On 



CO 

II 

cd O 
o o 

ca 

S c 



X 

o 
o 

S3 

i 

O 




2 

CL, 

a 
•p 



< |S 

1 2 S 
leg 



S3 "5 

OX) w 

*P < 
^ s 

*c3 S 



00 4> 



CD 
CD 

13 



1 
s S 1 



43 

CD 

o o 



CD 

■a 



oo 

§ 

CD 

J? 

o 
o 



<t> GO 

c ^ 

If 

CO ^ 

O 



i 



1 « 
I °- 

ft 

cd o 
CD 

O 



CD 

3 



>< 



00 



t-t 

cu 

1 

s 



o 



PQ 




o < ^ 



On 
OO 



o 

VO 
0\ 



o 

Os 

o 

< 



o 
m 

m 
o 

< 



oo 
D 



On 

OO 
»0 
ro 
O 

< 



o 
oo 

o 
o 
o 



in 
oo 
o 

On 



O 

o 
o 



X 



CO 
On 

X 



3 



-a 

I 



A U 

=tt= 



in 



8 5 



3fc 



c 

S rj 

co >< 

Oh K 



OO 



10 



2 

C 



CD 

fx, 



a 

•s I 

fro 

II ° 

.9 S 

<o So 

o 



3 

a 
o 

o 
o 

g 

6 

t 










5 


a 




a 


iva 




ctoi 


n 






CO 




on 


Pa 




ripti 


6 


1 


o 


i-H 




CO 






I 







8\ 

i 



s 

CO 



JO 

*o 

e 

o 

co 



g 

I S 

o 
o 

s 

o 



c3 



O 

o 

s 
•8 



a 
.2 

CM 

s 

CO 

i 



>> 

1 
£ 

H 
W 



1 1 
I* 

± & 
cd o 

o 

CO 



o 



CO 

3 

5b 

& 

o 

ts 

o 

CO 1 

1 

o 
o 



CM 



o 
o 

g 

a 



CO 

cd 

13 



s 

2 
o 

Oh 



Oh 

CO 



C3 

53 

s 




5 z 



m 
o 

CX) 
CO 

Q 



tr> 
oo 

o 



o 



oo 
oo 



o 

CN 

o 
o 
o 



o 
o 



oo 
m 



o 
o 
o 



to 

OS 



•5 


































mon v 


































Com 


































o 


































# Freq> 1 
Normal 


o 


o 


o 


o 


o 


o 


o 


o 


o 


o 




o 


o 


o 




o 


#Freq> 10 
RCC 




































r- 


o 


m 








oo 






VO 


vo 


OO 




VO 




-4— » 


































#Pre>< 
Norn 


o 


o 






o 


o 








O 




o 




o 




o 


# Present 
RCC 






m 


m 






m 








<N 


m 










Gene Annotation 


integrin, alpha 7 


colony stimulating factor 2 
(granulocyte-macrophage) 


diacylglycerol O-acyltransferase 
(mouse) homolog 


hydroxyprostaglandin 
dehydrogenase 15-(N AD) 


troponin Tl , skeletal, slow 


polymerase (RNA) II (DNA 
directed) polypeptide J (13.3kD) 


tumor suppressor deleted in oral 
cancer-related 1 


peroxisome proliferative 
activated receptor, delta 


long fatty acyl-CoA synthetase 
2 gene 


adaptor-related protein complex 
1, gamma 2 subunit 


ER to nucleus signalling 1 


coactivator-associated arginine 
methyltransferase-1 


myosin, light polypeptide 5, 
regulatory 


putative non-coding transcript 
(DiGeorge critical region 5) 


G protein-coupled receptor 3 


Human BRCA2 region, mRNA 
sequence CGOl 1 
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Accession 
Number 
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inositol polyphosphates- 
phosphatase, type 1, 107kD 


cholinergic receptor, nicotinic, 
alpha polypeptide 4 


H.sapiens mRNA for unknown 
liver orphan 
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GenBank 
Accession 
Number 
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Example 5. Probing the Molecular Basis of the RCC Disease Gene Classification Set 
in PBMCs 



[0595] The expression profiles in RCC PBMCs .were compared with profiles in RCC 

primary tumors. In these experiments the difference averages (rather than standard-curve 
normalized frequencies) of the 20 normal PBMCs and 45 RCC PBMCs from the present 
study were normalized using the GeneLogic GLGC normalization algorithm with difference 
averages detected in expression profiles of 57 normal kidney biopsies and 43 RCC tumor 
tissue biopsies. The expression profiles of normal kidney and primary RCC tumor tissues 
were downloaded in silico from the BioExpress database (Genelogic, Giathersburg MD). To 
identify any genes induced in both RCC PBMCs and RCC tumor tissue relative to normal 
controls, gene expression values for the 165 arrays were clustered according to the method of 
Eisen et aL, Proc. Nat. Acad. Sci., U.S.A., 95: 14863-14868 (1998). In these analyses only 
genes were clustered and the original order of the arrays as depicted was conserved in order 
to visually detect batteries of genes with patterns of regulation consistent with RCC tumor 
markers present in RCC peripheral blood. 

[0596] Expression profiles in RCC PBMCs were also compared with profiles in 

PHA-stimulated PBMCs ex vivo. In these experiments the expression profiles of 20 normal 
PBMCs and 45 RCC PBMCs were compared to expression profiles detected in (n=3) 
untreated or 6h PHA-stimulated PBMCs cultured ex vivo. Normalization using a standard 
curve to generate frequencies was performed, and hierarchical clustering of genes was 
subsequently performed. 

[0597] In addition, the expression profiles in RCC PBMCs were compared with 

profiles in PBMCs from non-RCC patients with renal failure. The difference averages of the 
20 normal PBMCs and 45 RCC PBMCs were normalized using the GeneLogic GLGC 
normalization algorithm with difference averages detected in expression profiles of 8 non- 
RCC renal failure PBMCs downloaded in silico from the BioExpress database (Genelogic, 
Giathersburg MD). Hierarchical clustering of genes only was subsequently performed. 
[0598] Furthermore, the 184 RCC disease genes listed in Tables 6 and 7 were 

compared to the 10 transcripts most strongly up-regulated in RCC tumors (n = 47) relative to 
normal kidney tissue (n = 60) using profiles downloaded from the Bioexpress Database 
(GeneLogic, Gaithersburg MD). The RCC tumor-specific transcripts that possessed the 
highest average fold differences in expression between RCC tumor tissue and normal kidney 
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were unchanged between normal and RCC PBMCs, suggesting that shed RCC tumor cells 
did not contribute significantly to the disease-associated transcripts identified in PBMCs 
isolated from RCC patients. 

[0599] The 184 RCC disease genes listed in Tables 6 and 7 were also compared to 

genes differentially expressed between unstimulated CD4 + T cells (n = 3 normal donors) and 
CD4 + T cells (n = 3 normal donors) stimulated ex vivo with anti-CD3 and anti-CD28 in 
culture. Stimulated CD4 + T cells possessed 14 transcripts that were greater than 2-fold 
changed in the same direction (induced or repressed) as the disease-associated transcripts in 
RCC PBMCs, as indicated in the last column of Table 7. 

[0600] The 184 RCC disease genes listed in Tables 6 and 7 were further compared to 

genes differentially expressed between PBMCs from non-RCC end-stage renal failure 
patients (n=9 individuals) and PBMCs from normal volunteers (n = 4 individuals). Of these, 
9 transcripts differentially expressed in PBMCs from renal failure patients were also disease- 
associated transcripts in RCC PBMCs, as indicated in the last column of Table 7. Thus, the 
184 RCC disease genes listed in Tables 6 and 7 contain a subset of markers commonly 
involved in immune responses measured ex vivo (CD4 + T cell activation) and in responses of 
circulating leukocytes to renal dysfunction observed in vivo. Without limiting the present 
invention to any particular theory, these results support a hypothesis that the expression levels 
of at least a subset of the disease-associated genes observed in RCC PBMCs may result from 
an activation of circulating T cells and/or other leukocytes in response to the presence of the 
tumor. In addition, it is possible that the regulation of another subset of disease-associated 
transcripts detected in RCC PBMCs may be due to alterations in leukocyte expression 
profiles in response to renal dysfunction in the RCC patients. 

Example 6. Classification of RCC and RCC-Free Status Using Gene Expression 
Profiles in Peripheral Blood Cells 

[0601] To build and train the RCC disease classifiers, 70% of the RCC PBMC 

expression patterns (n = 31) and 70% of the disease-free PBMC expression patterns (n = 14) 
were randomly selected and used as the training set. The remaining RCC and disease-free 
PBMC expression patterns were used as the test set. A relative class separation metric was 
used to calculate a measure of correlation and rank order the genes with expression levels 
most highly correlated with the classification vector characteristic of the training set. This 
measure of correlation is composed of mean expression values and variances. 
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[0602] Classification of the test set of samples was performed using a weighted 

voting method to classify the remaining PBMC expression profiles as characteristic of RCC 
or disease-free PBMCs. In this method the expression level of each gene in the classifier set 
contributes to an overall prediction strength which determines the classification of the 
sample. The prediction strength in this example is essentially a combined variable that 
indicates the number of "votes" for either one class or another, and can vary between 0 
(narrow margin of victory) and 1 (wide margin of victory) in favor of the predicted class. To 
quantitate the accuracy of this prediction method, a value of 0.3 was imposed as the 
prediction strength threshold above which calls could confidently be made. 
[0603] In this example, the accuracy of prediction for any given classifier gene set is 

defined as the percentage of calls with prediction strengths greater than 0.3 that also classifies 
samples correctly. The class predictors used in this example include (1) a 2-gene class 
predictor consisting of TLR2 and EEF1A2, (2) a 4-gene class predictor consisting of TLR2, 
LGALS3, EEF1A2, and BRF2, (3) a 6-gene class predictor consists of TLR2, LGALS3, 
DKFZP586E1621, EEF1A2, BRF2, and SNRPG, (4) an 8-gene class predictor consists of 
TLR2, LGALS3, DKFZP586E1621, SOD2, EEF1A2, BRF2, SNRPG, and NUMA1, (5) a 
10-gene class predictor consists of TLR2, LGALS3, DKFZP586E1621, SOD2, DUSP6, 
EEF1A2, BRF2, SNRPG, NUMA1, and AKR1B1, (6) a 1 2-gene class predictor consists of 
TLR2, LGALS3, DKFZP586E1621, SOD2, DUSP6, KIAA0669, EEF1A2, BRF2, SNRPG, 
NUMA1, AKR1B1, and SMARCE1, (7) a 14-gene class predictor consists of TLR2, 
LGALS3, DKFZP586E1621, SOD2, DUSP6, KIAA0669, IL1RN, EEF1A2, BRF2, SNRPG, 
NUMA1, AKR1B1, SMARCE1, and MSF, (8) a 1 6-gene class predictor consists of TLR2, 
LGALS3, DKFZP586E1621, SOD2, DUSP6, KIAA0669, IL1RN, KIAA0410, EEF1A2, 
BRF2, SNRPG, NUMA1, AKR1B1, SMARCE1, MSF, and PTMA, (9) an 18-gene class 
predictor consists of TLR2, LGALS3, DKFZP586E1621, SOD2, DUSP6, KIAA0669, 
IL1RN, KIAA0410, T54, EEF1A2, BRF2, SNRPG, NUMA1, AKR1B1, SMARCE1, MSF, 
PTMA, and PSMD3, and (10) a 20-gene class predictor consists of EEF1A2, TLR2, BRF2, 
LGALS3, SNRPG, DKFZP586E1621, NUMA1, SOD2, AKR1B1, DUSP6, SMARCE1, 
KIAA0669, MSF, IL1RN, PTMA, KIAA0410, PSMD3, T54, C1QBP, and OSR1. 
[0604] The accuracy of prediction for both the training sets and the test sets of RCC 

PBMCs with each set of predictor genes was calculated. Calculating the accuracy of 
classification for a training set indicates how uniformly the predictor gene set was positively 
correlated with each individual sample in the training set, whereas calculating the accuracy of 
prediction for a test set indicates how well the expression of this gene set predicted the 
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identity of individual samples in an '^unknown" group. Table 8 illustrated the accuracy of 
prediction with each of the above-described class predictors. Classifier gene sets using 10 or 
more genes in the weighted voting algorithm yielded 100% accuracy in prediction of the test 
set. These studies demonstrate the feasibility of performing simple pair-wise prediction of 
RCC versus RCC-free status using expression patterns found in a limited number of gene 
transcripts in the compartment of peripheral blood. 



Table 8. Prediction Accuracy of the Class Predictors 
of the Present Invention 



Genes in the 
Class Predictor 


Prediction 
Accuracy for 
Training Set (%) 


Prediction 
Accuracy for 
Test Set (%) 


2 


71.88 


100.00 


4 


75.00 


92.31 


6 


82.76 


90.91 


8 


88.89 


84.62 


10 


92.59 


100.00 


12 


92.59 


100.00 ! 


14 


93.10 


100.00 


16 


92.86 


100.00 


18 


93.10 


100.00 


20 


92.86 


100.00 



[0605] FIG. 3 shows a summary of the training set cross validation results for 

predictor gene sets of increasing size. A subset of RCC and normal PBMC samples (70%) 
were used as a 'training set" to generate classifier gene sets, and then each predictor set was 
evaluated by cross validation to identify the predictor set with high accuracy for classification 
of the samples in the training set. Genecluster's default correlation metric (Golub et al. 9 
supra) was used to identify genes with expression levels most highly correlated with the 
classification vector characteristic of the training set. All of 5,249 genes meeting the initial 
filter criteria were screened using this approach. 

[0606] Prediction was also performed in Genecluster using the weighted voting 

method. In this method, the expression level of each gene in the classifier set contributes to 
an overall vote on the classification of the sample (Slonim et aL 9 supra). The prediction 
strength is a combined variable that indicates the support for one class or the other, and can 
vary between 0 (narrow margin of victory) and 1 (wide margin of victory) in favor of the 
predicted class. Predictor sets containing between 2 and 20 genes were evaluated by leave 
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one out cross validation to identify the predictor set with the highest accuracy for 
classification of the samples in the training set (FIG. 3). 

[0607] The 8 gene predictor set (89% accuracy) was selected for test set prediction. 

The 8 gene set consists of TLR2, LGALS3, DKFZP586E1621, SOD2, EEF1A2, BRF2, 
SNRPG, and NUMA1. FIG. 4 shows the relative expression levels of the 8 predictive genes 
in the training set. Each gene is represented by its respective qualifier. Graphically presented 
are the 4 genes elevated in RCC relative to normal PBMCs (TLR2, LGALS3, 
DKFZP586E1621, and SOD-2) and the 4 repressed genes in RCC relative to normal PBMCs 
(EEF1 A2, BRF2, SNRPG, and NUMA1). The expression level increases from blue to read. 
[0608] The individual prediction confidence scores for each sample in the training set 

using this 8 gene classifier set are presented in FIG. 5A. For illustrative purposes, a positive 
sign was assigned to the prediction strengths resulting in votes for RCC and a negative sign 
was assigned to prediction strengths resulting in votes for normal PBMCs. A leave-one out 
cross validation was performed and the prediction strengths were calculated for each sample 
in the training set. Training set samples were ordered in the same order as in FIG. 4. 
[0609] FIG. 5B illustrates the prediction results for the remaining test set of RCC and 

normal PBMC samples using the 8 gene predictor set. On the test set, the predicted class 
matched the true class in all cases, though for one of the 19 test samples the prediction 
strength was negligible. These studies demonstrate the feasibility of predicting RCC versus 
disease-free status using expression patterns found in a limited number of gene transcripts in 
mononuclear cells from peripheral blood. 

Example 7. Differentially Expressed Genes in RCC Tumor Tissues and Non-RCC 
End-Stage Renal Failure Patients 

[0610] Expression profiles of RCC PBMCs were compared with expression profiles 

of RCC tumor tissue or PBMCs from patients with renal failure. In each comparison, a 
multivariate (hierarchical clustering) analysis was employed to search for co-regulated 
batteries of genes between the groups, followed by a fold-change analysis and Student's /-test 
to support any findings. In the first analysis, expression profiles of RCC PBMCs were 
compared in silico with expression profiles of RCC tumor tissues (n = 43 biopsies) from the 
GeneLogic BioExpress database (Gaithersburg, MD). All samples were ordered in a 
supervised fashion (z'.e., no arrays were clustered) and genes were ordered using a 
hierarchical clustering approach to identify gene sets upregulated in both PBMCs of RCC 
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patients and RCC tumor biopsies compare to disease-free controls. Fold change analysis 
identified 24 RNA species that were statistically significant (p<0.05, Student's f-test) and 
greater than 2-fold induced in RCC PBMCs relative to diseas^free PBMCs and in RCC 
tumors relative to disease-free kidney tissue. 

[0611] These 24 RNA species correspond to FABP5 5 SCYA20, ADM, COPEB, 

FCGR3B, UNK_M62896, FN1, HMOX1, ITGA7, DGCR5, CBP2, UNK_AL049250, 
SLC1A4, MMP9, SLC16A3, LILRB3, FCGR1A, LHFPL2, PLEC1, S100A11, SPOP, CCR1, 
TLR2 and KIAA0750, respectively. In addition, these 24 RNA species are capable of 
hybridizing under stringent conditions to CPSs 57, 229, 92, 91, 221, 26, 236, 207, 16, 8, 245, 
152, 2, 58, 192, 19, 99, 28, 191, 138, 143, 61, 1, and 148, respectively. 

[0612] In the second analysis, PBMCs from non-RCC end-stage renal failure patients 

(n=8 individuals) were compared with PBMCs from disease- free volunteers and patients with 
RCC. Hierarchical clustering of genes in these groups of samples identified several clusters 
of genes that appear to be similarly regulated between advanced RCC patients and patients 
with end-stage renal failure. Fold change analysis identified a plurality of RNA transcripts 
that were statistically significant (p<0.05, Student's /-test) and greater than 2-fold induced in 
RCC PBMCs and in PBMCs from non-RCC patients with renal failure relative to disease- 
free PBMCs. The CPSs capable of hybridizing to these RNA transcripts under stringent 
conditions are depicted in Table 9. The genes corresponding to the CPSs are also indicated. 

Table 9. RCC Disease Genes that Are Differentially Expressed in Non-RCC Renal Failure 

Patient Relative to Disease-free PBMCs 



CPS No. 


Corresponding 
Genes 


92 


ADM 


91 


COPEB 


34 


AQP9 


222 


PTGS2 


244 


STIP1 


53 


SOD2 


151 


PDXK 


18 


IL1RN 


21 


ANXA5 


109 


IFIT4 


211 


IL1B 
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CPS No. 


Corresponding 
Genes 


201 


GROl 


104 


PLAUR 


130 


NP 


58 


MMP9 


192 


SLC16A3 


19 


LILRB3 


99 


FCGRIA 


28 


LHFPL2 


191 


PLECl 


138 


SlOOAll 


143 


SPOP 


61 


CCRl 


1 


TLR2 


1 AQ 


JV1/A_/\U / J U 


105 


CDC34 




rvJLivZJ 


10 


ETS2 


125 


MAD 


OA, 


GPR3 


1 1 
1 1 


t>tp^v i r 1 

.rlx 3xVl ^/ 




PRF1 

J. IVl X 


1 

1 /o 








1 2 


TCFL1 

X V^X -I—/ X 


/ 


rjr^AT 

LJ\JJr\ I 


146 


C" 1 AA1) 


165 


DOC-1R 


62 


C8FW 


128 


PDI2 


133 


GEF-2 


147 


TNNT1 


111 


BSG 


84 


IL17R 


zz / 


riJvj 


115 


RALBP1 


195 


RNASE2 


25 


TPM1 


40 


BLVRB 


35 


APS 


17 


PPARD 
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Corresponding 
Genes 


157 


NFE2 


14 


IL1RAP 


173 


C 1 A A A 1 O 

blUUAlZ 


174 


CD9 




CiNlVJlVl/V 


135 


T T A r(TT 

HAGH 


247 


NCF1 


250 


FLOT1 


94 


ITGA2B 


| 148 


KIAA0750 


194 


FKBP8 


4 


DUSP6 


87 


CBFA2T3 



[0613] The genes and CPSs listed in Table 9 can be used as markers for renal failure 

and other types of renal dysfunction. 

Example 8. Prediction of RCC Status Versus Disease-free Volunteers and Patients 
with Other Solid Tumors 

[0614] In this analysis, expression profiles were compared simultaneously among 

four classes of PBMCs which include RCC PBMCs, disease-free PBMCs, prostate cancer 
PBMCs, and head and neck cancer PBMCs. An initial hierarchical analysis demonstrated the 
global transcriptional relationships between the expanded database of PBMC expression 
profiles. 70% of the samples were then used as a training set, and a multi-class correlation 
metric was employed to identify and rank the genes most highly correlated with each class of 
PBMC expression profile (RCC, disease-free, prostate carcinoma, head and neck) in the 
database. A 20-gene classifier was determined. These genes and the corresponding CPSs are 
illustrated in Table 10. This 20-gene set can be used to predict each class versus all other 
classes. 

[0615] The ability of this gene set to predict the remaining 30% of the samples as 

RCC versus non-RCC was calculated. The gene set was able to predict each remaining 
PBMC profile in the test set as RCC or noi>RCC with 89% or 92% accuracy, respectively. 
As appreciated by one of ordinary skill in the art, a subset of these 20 genes, such as 2, 4, 6, 
8, 10, 12, 14, 16 or 18 genes, can be used to predict RCC from non-RCC. Non-RCC includes 
other solid tumors, such as prostate cancer or head/neck cancer. 
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Table 10. Gene Set For Predicting RCC Versus Disease-free Volunteers 
and Patients with Other Solid Tumors 



CPS No. 


Corresponding Genes 


i 268 


CD44 


269 


CRADD 


270 


CCRL2 


71 


KIAA0837 


271 


KIAA0707 


272 


KIAA1113 


64 


EREG 


i 273 


UNK AL050119 


17 


PPARD 


37 


CTSL 


59 


ATP2B1 


274 


UNK AF052115 


275 


MITF 


276 


STAT3 


264 


KIAA0410 


277 


TPD52L2 


278 


UNK AI732885 


31 


MARCO 


69 


LOC64116 
(also referred to as UNK AL049963) 


50 


PDNP2 



Example 9. Identification of A Solid Tumor-Free Predictor Gene Set 

[0616] Supervised analysis of expression profiles in diseetse^free PBMCs and PBMCs 

from different solid tumors was conducted. PBMC expression profiles from 3 out of 5 
Head/Neck cancer patients, 14 out of 20 disease^free volunteers, 1 1 out of 15 prostate cancer 
patients, and 32 out of 45 RCC patients were classified, and a k-nearest neighbor's algorithm 
calculated the genes most highly correlated with each class distinction. The 19 top genes 
with expression patterns most highly correlated with these PBMCs from head/neck patients, 
disease-free volunteers, prostate cancer patients, and RCC patients were identified. The top 
19 genes thus identified were then used to determine the accuracy of prediction of solid- 
tumor versus solid tumor-free status in the remaining PBMC samples. A weighted voting 
method was used to determine the prediction strength for each sample. These 19 genes are 
listed in Table 1 1 . 
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Table 11. A Solid Tumor-Free Predictor Gene Set 



CPS No. 


Corresponding Genes 


A A * "XT 

Entrez Accession No. 


258 


NUMA1 


Z11584 


285 


CXCR4 


L06797 


I 107 


IL10RA 


U00672 


286 


M9 


AB019392 


287 


FAU 


X65923 


256 


BRF2 


U07802 


288 


RPS6 


X67309 


255 


EEF1A2 


X70940 


289 


BAG5 


AB020680 


259 


AKR1B1 


X15414 


290 


UNK AL022721 


AL022721 


266 


C1QBP 


M69039 


291 


DKZP586E0820 


AL050147 


292 


NONO 


U02493 


265 


PSMD3 


D67025 


131 


UNK N74607 


N74607 


293 


UNK AI743507 


AI743507 


294 


MAPKAPK5 


AF032437 


295 


UNK U79297 


U79297 



[0617] The foregoing description of the present invention provides illustration and 

description, but is not intended to be exhaustive or to limit the invention to the precise one 
disclosed. Modifications and variations are possible consistent with the above teachings or 
may be acquired from practice of the invention. Thus, it is noted that the scope of the 
invention is defined by the claims and their equivalents. 
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